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A method is described for identifying a rather small set of extendible primers for use in the identification, typing or classification 
of a nucleic acid of known sequence having known polymorphisms. A matrix of primers and pairs of primer extensions is prepared and 
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PRIMERS FOR IDENTIFYING TYPING OR 
CLASSIFYING NUCLEIC ACIDS 

5 

DNA-sequence analysis is rapidly becoming a standard tool 
in modern, molecular biology research. Examples of applications include: 
Sequencing of unknown DNA-sequences, Identifying novel genes in 
stretches of sequenced DNA t Predicting protein-sequence and -structure 
to from DNA-sequence alone and Identification of known gene-variations 
(sometimes called "typing a gene"). 

Typing of a gene could be crucial in some applications. For 
instance, organ-donation requires that the "immunological signature" of the 
donor matches that of the receiver. This "signature" is mediated by the 
15 Human Leucocyte Antigen (HLA) complexes (also known as Major 
Histocompatibility Complex, MHC) on the ceil surface, and the 
corresponding genes are among the most varied in the human genome. 
Considering the importance of organ donation, the shortage of organ- 
donors and the fact that an organ cannot be stored for any longer time- 
20 periods, a rapid and accurate typing of the HLA-genes is required in order 
to make most use of the organs available for transplantations. 

Another application where a rapid and accurate identification 
of a gene is desired is when trying to identify unknown bacteria. A rapid 
identification of the bacteria causing the illness of a patient makes it 
25 possible to administer the correct medication early in the treatment of the 
disease, thus reducing the discomfort for the patient. Since every self- 
replicating organism so far studied use ribosomes when translating mRNA 
to proteins, analysis of one of the genes coding for the ribosome, for 
instance the 16S rRNA in the case of prokaryotes, could be used to identify 
30 the organism in question. 

There are several ways in which a gene can be identified, 
with the conceptually easiest being to sequence the entire gene and then 
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looking at the result. The main drawback is that this approach is time- 
consuming, and not easily scaled up using conventional methodology. A 
new method, Arrayed Primer Extension (APEX), lacks this drawback. 
APEX works by immobilising a large number of primers to a solid surface, 
5 thus creating a pNA-chip. These primers are constructed to be 

consecutively overlapping over the entire gene of interest, so that every 
base in the gene will have a primer to its 5'-end. By adding fluorescently 
labelled dideoxynucleotides, the primers will then be extended by one 
nucleotide using the sample DNA as template. It will thus be easy to check 
10 which nucleotide was incorporated, which in turn tells you the entire 
sequence of the sample DNA. 

Since some genes, like the HLA and 16S rRNA, have a large 
number of known variations, a prohibitively large number of primers have to 
be created in order to probe for all possible combinations of variant 
1 5 positions in the gene. Thus the array primer extension method APEX for 
resequencing would need more than 16.000 primers if all DQB alleles 
would be sequenced from a 500 bp long PCR fragment. If all DQB alleles 
in pairs should be combined the number of primers might be even higher 
which would be the situation for a heterozygote found in most individuals. 
20 But this might not be necessary, if some variations always or 

never occur together. This needs to be studied though, and a Way found to 
determine the least number of primers (and what their sequences are) 
required for unambiguously identifying those genes. 

An object of this invention is to find and implement an efficient 
25 algorithm capable of doing just that. The algorithm should preferably also 
take into account the melting points of the primers, so that the extension 
reaction can take place under optimal conditions for all of the primers on 
the chip. It should also minimise the number of "self-extended" primers, i.e. 
primers that can extend themselves without any sample DNA. This 
30 algorithm is then to be tested and evaluated on the HLA and 1 6S rRN A- 
genes. HLA is chosen partly because of the importance of rapid typing of 
these genes, leading to the fact that there are many other methods to 
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which APEX can be compared. It is also because the HLA-genes are 
"easy" to work with, since they rarely contain any insertions or deletions. 
These kinds of variations in the gene could potentially create problems 
when designing primers for APEX. The 16S rRNA, on the other hand, 

5 contains insertions and deletions and can thus be used to see if the 
algorithm can handle such variations. 

The invention provides a method of identifying a set of 
extendible primers for use in the identification, typing or classification of a 
nucleic acid of known sequence having known polymorphisms wherein: 

10 i) all possible nucleotide sequences of a chosen length of the 

nucleic acid are identified and their corresponding extendible primers, 
ii) at least one extendible primer is removed from the set 

wherein the at least one primer removed identifies a segment of the nucleic 
acid identified by at least one other primer. 

is Preferably the method includes between step i) and ii): 

ia) potential extensions for each primer are identified with 
respect to each nucleotide sequence, 

ib) for each extendible primer the identified potential extensions 
are compared to determine which pairs of sequences can be discriminated 

20 by the primer. 

Preferably a matrix of primers and pairs of primer extensions 
is prepared in binary form and is subjected to analysis by a set covering 
problem (SCP) algorithm as described in more detail below. 

The invention also includes a set of extendible primers, for 
25 use in the identification, typing or classification of a nucleic acid of known 
sequence having known polymorphisms, identified by the method as 
defined. Preferably the primers are attached by 5 -ends to a surface of a 
support on which they are presented in the form of an array. 

In another aspect, the invention provides a set of extendible 
30 primers, for use in the identification, typing or classification of a human 
leucocyte antigen (HLA) gene as indicated, the set comprising about the 
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number of primers indicated and being capable of distinguishing about the 
number of alleles indicated: 





HLA gene 


Number of 


Number of 




Alleles 


Primers 


Class I 


HLA-A 


91 


172 




HLA-B 


200 


<1000 




HLA-C 


47 


94 


Class II 


DPA-1 


11 


26 




DPB-1 


74 


130 




DQA-1 


17 


130 




DQB-1 


34 


84 




DRB-1 


192 


<1000 




DRB345 


35 


94 



5 In another aspect, the invention provides a set of extendible 

primers, for use in the identification, typing or classification of 16S rRNA, 
wherein the set comprises about 210 primers and is capable of 
distinguishing at least about 1207 different sequences. 

In these aspects of the invention, the approximate number of 
10 primers is indicated. As indicated below, it may be possible by the use of 
the algorithms exemplified or other algorithms to generate slightly smaller 
sets of primers capable of distinguishing the number of alleles or 
sequences indicated, and these sets are envisaged according to the 
invention. Of course, other primers may be present in addition to those 
is indicated as essential, and may be useful for checking purposes. The 
number of alleles or sequences indicated represents the approximate 
known number of polymorphisms or different sequences, and these will 
surely increase with time. 

In another aspect the invention provides a method of 
20 identification, typing or classification of a nucleic acid of known sequence 
having known polymorphisms, by the use of the set of extendible primers 
as defined, which method comprises applying the nucleic acid or fragments 
thereof to the set of extendible primers under hybridisation conditions and 
effecting template-directed chain extension of extendible primers that have 
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formed hybrids. Preferably template-directed chain extension is effected 
using four different fluorescently labelled chain-terminating nucleotide 
analogues, and results are analysed by an imaging system such as total 
internal reflection fluorescence (TIRF) or scanning confocal microscopy. 
5 The various steps of the method may be performed as described in the 
literature for the known APEX technique. 

In another aspect the invention provides a kit for use in the 
identification, typing or characterisation of a nucleic acid of known 
sequence having known polymorphisms, comprising the set of extendible 

10 primers as defined. 

In another aspect the invention provides an array of sets of 
extendible primers as defined, for the simultaneous identification, typing or 
classification of two or more different HLA genes. 

With the present invention it has been realised that where a 
15 number of different alleles are to be identified, the total number of primers 
required to distinguish each of the alleles could be reduced as some 
primers would be common to all of the alleles, for example. Thus, with the 
present invention complete sets of primers for identification of each allele 
are identified and then the total number of primers in the combined sets is 
20 reduced using predetermined rules. 

Furthermore the present invention is based on the premise 
that as the primers are used to identify the presence or absence of a 
particular nucleotide sequence in any allele, the specific nucleotide that 
extends any particular primer is of less relevance than simply whether the 
25 primer has been extended. Thus, the problem of reducing the overall 
number of primers is greatly simplified rendering the problem one suitable 
for treatment as a Set Covering Problem (SCP). 

Embodiments of the present invention wilt now be described 
by way of example with reference to the accompanying drawings and 

30 examples, in which: 

Figure 1 is a diagram of a signal matrix in accordance with 

the present invention; 
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Figure 2 is a diagram of the corresponding binary matrix for 

the signal matrix of Figure 1 ; 

Figure 3 is a flow diagram of the steps for reducing the primer 
set in accordance with the present invention. 

5 The following is an explanation to assist in an understanding 

of the principles underlying the manner in which the number of primers 
used in the identification of a plurality of sequences may be reduced. 

Theoretically the number of primers required to identify k 
sequences grows as 0(k»l), where / is the length of the sequences as each 

10 sequence requires / primers. However, the less the sequences differ from 
one another, the fewer primers are required as many of the primers 
required for identification of a first sequence may also be of use in 
identification of another sequence. This effect becomes more pronounced 
the greater the number of sequences to be identified and the greater the 

15 similarities. 

Considering an initial set of n primers required in the 
identification of k sequences, a signal matrix of k x n can be constructed. 
Each element in the matrix represents the signal, if any, that is generated 
by a particular primer with respect to a particular sequence. The signal will 

20 either be one of the four nucleotides 'A', 'C\ 'G\ or T or no signal 
Figure 1 is an example of such a signal matrix where, for example, the 
signal generated by primer 2 with respect to sequence 3 is T. 

The signal matrix is then converted into a binary matrix that 
represents whether the signals for any particular primer differ with respect 

25 to different sequences. Thus, again with respect to primer 2, the same 
signal 'G' is generated for both sequences 1 and 2 but a different signal T 
is generated with respect to sequence 3. The binary matrix is constructed 
by considering each column (each primer) of the signal matrix and 
comparing each signal in that column in turn. Thus, as shown in Figure 2, 

30 the first row of the matrix represents a comparison of the signals for the first 
and second sequences, the second row represents a comparison of the 
signals for the first and third sequences and the third row represents a 
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comparison of the signals for the second and third sequences. Binary *0' 
represents the comparison revealing the same signal and binary '1 ' 
represents the comparison reveals different signals. In the case of primer 
2, as mentioned earlier the signals for the first and second sequences are 
5 the same ('0') whereas the signals for the first and third sequences are x 
different ('1 '). This conversion produces a matrix mxn where m=(k(k-1))/2. 
Hence, for large numbers of sequences. 2m grows approximately as the 
square of the number of sequences. Figure 2 shows the binary matrix for 
the signal matrix of Figure 1 . 
io As the primers are required to enable the differentiation of 

sequences from one another, the reduction of the signal matrix to a binary 
matrix, representing differences in the signals obtained for different 
sequences, distils that element of information necessary to enable a 
selection of the minimum number of primers necessary to identify the 
is individual sequences. From the binary matrix the least number of columns 
are selected such that each row contains at least one non-zero element. 
Thus, if one of the columns contained all Ts only that one column would 
be required. However, in the case of Figure 2, there is no single column 
containing all Ts and so two columns must be selected, for example 
20 primers 1 and 2. Primers 1 and 2 together enable each of sequences 1 , 2 
and 3 to be differentiated and so the remaining primers are redundant. 

Where large numbers of sequences and primers are involved, 
the binary matrix renders the data contained within that matrix suitable for 
mathematical analysis. Once the selection of the reduced number of 
25 primers has been made, though, it is the signal matrix that is required 

during the use of the primers in the identification of the different sequences. 
Thus, the signal matrix is used to 'decode' the results of any analysis using 
the reduced number of primers. 

In practice, large numbers of sequences and primers are 
30 involved and the selection of a reduced set of primers cannot be performed 
by simple inspection of the binary matrix. For large numbers of primers, 
selection of a suitable reduced set of primers can be performed by treating 
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the selection as a Set Covering Problem (SCP). An SCP is an integer 
optimisation problem and is well known in fields such as airline crew 
scheduling, selecting manufacturing equipment and ingot mould selection 
in steel production. In such large scale problems that cannot be solved 

5 exactly (NP-hard), heuristics are used in order to generate a solution. As a 
SCP is NP-hard, global algorithms and algorithms that identify local optima 
are not very suitable on their own for a large scale SCP. They will simply 
require far too much computation, as they try to find a solution that can be 
proven to be at least locally optimal. For this reason heuristic methods are 

10 required instead. They do not claim to give even locally optimal solution, 
but are much faster. 

Two known computational methods that have been found to 
be effective in identifying reduced sets of primers are the 'greedy' algorithm 
and Lagrangian relaxation algorithm. 

15 

Greedy Algorithm 

The most simple heuristic is the greedy algorithm, where 
columns are added one at a time. The column to be added in each step is 
chosen so as to cover as many uncovered rows as possible (a row is 
20 covered if it has at least one non-zero element). In other words, if S r is the 
set of columns already included in the solution at iteration r, and R r is the 
set of rows with no non-zero elements at iteration r, column // is selected 
according to: 

j\ - arg min c y / P $ j € S r 
25 Equation 1 

This continues until all rows are covered, or until no more 
columns exist which can cover any of the rows still uncovered. Instead of 
minimising the term q/P h other terms can be used. Example terms are q, 
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Cj/log 2 Pj or Cj/(Pj)2. Greedy algorithms of this type are described in "An 
Efficient Heuristic for Large Set Covering Problems", Vasko, Wilson, Naval 
Research Logistics Quarterly 1984, 31:163-171 the contents of which is 
incorporated herein by reference. The difference is in how much emphasis 
to place on the cost of the column versus how many rows the column 
covers. It is shown, however, that this entire class of heuristics share the 
same worst case behaviour. If we denote the set of columns in the solution 
as S and the solution value as Z, then the worst case behaviour can be 
described as: 



7 



15 



20 



25 



Equation 2 



where 



Z = X>,*, 
W) = Z~» «* = maxj|> ff 

y=l J ' M 

Equation 3 



In other words, how much worse the heuristic solution is 
compared to the optimal solution is dependent on the maximum number of 
non-zero elements in the columns. The advantage is that this algorithm is 
fast, even though its time complexity is 0(m*n) (there can be a maximum of 
m columns in the solution, i.e. the maximum number of iterations is m. For 
each iteration the matrix is traversed once to find the next column to be 
added). Altogether, we have that the time required to solve the problem in 
the worst case scenario will grow as the number of sequences to the power 
of five (four due to the number of rows, and one due to the number of 
columns). In the case of 16S rRNA (see later), where we have -1000 
sequences, the matrix will have -500,000 rows. The number of primers 
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Laaranaian relaxation 

More sophisticated methods exist, which use other kinds of 

5 heuristics. One heuristic capable of generating the most optimal solutions 
is believed to be some kind of Lagrangian relaxation heuristic, where in 
each iteration the Lagrange multipliers for each column are used to 
calculate the Lagrangian cost for the columns. Such a Lagrangian 
relaxation heuristic is described in "A Heuristic Method for the Set Covering 

10 Problem", Capara et al Technical Report OR-95-8, Operations Research 
Group, University of Bologna 1995 the content of which is incorporated 
herein by reference. A near optimal vector of these costs is then calculated 
by a subgradient algorithm, before being used as input to a greedy 
algorithm. This is repeated until no improvements in the solution can be 

15 made. 

In Lagrangian subgradient methods the Lagrangian of the 
original problem is considered instead of the original problem. In this case, 
the Lagrangian will be 

n m 

L(u) = min£c y (w);c y . +£w,. 
0 

20 Equation 4 

where u§ is the Lagrangian multiplier for row /. q(u) is the 
Lagrangian cost associated with column/, and is defined by 

m 

Cj(u) = c J -Y j a ij u j 
25 Equation 5 
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An optimal solution to Equation 4 is given by 



0 if c ; (u)>0 

1 if cj(u)<0 

0 or 1 if Cj(u) = 0 



Equation 6 

L(u) can also be seen as an estimate of the lower bound for 
the solution, i.e. the sum of the costs for the columns in the optimal solution 
to the SCP will be > L(u). The solution to the SCP can be found by finding 
an optimal multiplier vector u instead, but this will require much 
computation especially for a large SCP. But near-optimal multiplier vectors 
can be found within short time by using the subgradient vector s(u), defined 
by 

n 

;=i 

Equation 7 



u can be refined iteratively by using for example 
= max< 



Equation 8 

where X > 0 is a step-size parameter and UB is an upper 
bound on the value of the solution. The initial u° can be defined arbitrarily. 
20 To solve the SCP, first a near-optimal multiplier vector u is found. Th.s and 
Equation 6 is then used as a basis to form a feasible solution. The upper 
bound UB can then be updated to the value of this feasible solution (.f .t .s 
better than the previous best solution), and a new near-optimal multiplier 
vector found and so on until convergence is reached. 
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Another alternative computational method that may be 
employed to solve such a SCP is 'surrogate relaxation 1 in which in each 
iteration a corresponding continuous problem is solved and made feasible 
before a sub-gradient algorithm is applied. Alternatively, genetic algorithms 

5 may be employed in which the 'genome' consists of n bits, one bit for each 
of the columns. 

It should also be borne in mind that as the SCP operates on 
the binary matrix which only represents differences in signals between 
sequences for the same primer, a primer in the selected reduced set may 

10 generate a negative, *-\ signal rather than a positive signal, A, C, G, T. To 
be sure that the sample does in fact contain a particular sequence it is 
essential to ensure that for each sequence at least one primer generates a 
positive signal. Furthermore, in practice redundancy is desirable as all 
reactions may not occur as intended. Therefore, the least number of 

15 positive signals as well as the least number of differences in the signal 
pattern is preferably larger than one. 

With reference to Figure 3, the following is a description of 
one method of selecting a reduced set of primers. 

Firstly, all possible primers are selected (10) using the 

20 standard APEX procedure to produce a first set of primers. During this 
selection a substring of the sequence to be analysed is used to construct 
one primer, then the substring is displaced by one base and another primer 
is constructed. This process is carried out from the start of the sequence 
until the entire sequence has been covered. Both strands of DNA are used 

25 and this is repeated for all sequences. The primers should be long enough 
to be capable of discriminating between exact matches and mismatches 
involving one or two nucleotide pairs. Conveniently, the primers are 13bp 
long as this has been found to be sufficient to ensure the reaction, or longer 
to increase hybrid stability. However, to avoid steric hindrance on the chip 

30 each primer may be 5*-taiIed. In this example, twelve T's are added to the 
5'-end of the primer so that the final length of the primers is 25bp. 

Next all primers that are not suitable as primers are rejected 
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(12) and the rest is included in a primary primer set. Unsuitable primers 
are those where the three bases at the 3-end are complementary to any 
substring of the primer. In some instances this can result in the primer 
being extended by a neighbouring primer and not the sample DNA as a 

5 template and for that reason such primers are considered unsuitable. 

Also, any primers that would produce ambiguous signals are 
identified and rejected (14). A primer produces an ambiguous signal where 
it is not known which of the four bases is in the relevant position. 

Each of the remaining primers in the primary set primer is 

10 then compared to each sequence in turn to determine whether the primer is 
extendible by each sequence and if the primer is extendible the base with 
which it would be extended is determined. A signal matrix of the primers 
with respect to each of the sequences is thus generated (16). 

In order for a primer to be extended using the sample DNA as 

15 template, the three bases in the 3'-end of the primer must hybridise to the 
DNA. Otherwise the enzyme responsible for the extension will not be able 
to add a nucleotide to the primer. Of the rest of the primer (the poly-T tail 
excluded), at most two mismatches are allowed, otherwise the primer-DNA 
duplex is considered to be too unstable to be extended. 

20 In ordinary PCR, all the bases must match in order for the 

primer to be extended. But then the temperature is raised to the melting 
point, T m , of the primer in the extension step. In APEX, this reaction is 
carried out at 45°C, which is around 10°-20° below T m of most primers. 
This means that the primers will hybridise to the DNA despite a few 

25 mismatches, which is why two mismatches are allowed here. 

In some cases a primer could hybridise to a sequence in 
more than one position, and sometimes a primer could hybridise to both 
strands of one allele and give different signals. In those cases all the 
different signals are combined to form one resulting signal (e.g. 'A' and 'C 

30 together forms M, which is the NC-IUB (NC-IUB, 1985) code for this 
combination). 

For each column of the signal matrix the entries for each row 
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are compared against one another, in other words for each primer the 
signals produced by the primer for each sequence are compared against 
each other. A binary matrix is thus generated (18) of the primers with 
respect to the identity or difference of signals for pairs of sequences. The 

5 binary matrix contains non-zero entries where the primer is able to 
distinguish between a pair of sequences. 

The number of pairs of sequences that each primer can 
distinguish between are counted and a score is allocated to each primer 
(20) in dependence on the total number of pairs of sequences counted. 

10 Thus, the number of non-zero elements for each primer are counted. 

Primers that are unable to distinguish between any pairs of sequences are 
rejected (22) and the remaining primers are sorted (24) in order of their 
score with the primers with the higher scores at the beginning. 

A core of primers is created next (26). The primer with the 

is highest score is selected. Where two primers with equal scores exist, the 
number of positive signals is determined for each and the primer with the 
greater number of positive signals is chosen. If both primers remain equal, 
one is then selected arbitrarily over the other. After the main primer has 
been selected, the first twenty (five times the desired redundancy which is 

20 four here) primers giving positive signals for each sequence in turn are 
selected for the core. All remaining primers are rejected. 

A greedy algorithm is then run (28) using the core set of 
primers to identify the minimum number of primers necessary to distinguish 
each sequence. As the greedy algorithm is run, primers are added one at 

25 a time with each primer being selected in turn in relation to the number of 
uncovered rows it is capable of covering. When all rows are covered at 
least four times the reduced set of primers is checked for any sequences 
that has fewer than four positive signals and extra primers are added as 
necessary to meet this minimum requirement. 

30 A redundancy check is then performed (30) to identify 

whether any more primers can be removed. During the redundancy check 
each primer is "tentatively" removed in turn to see whether the remaining 
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primers meet the minimum requirements. 

* If not, the next primer is tried. Otherwise the primer is 

temporarily removed from the set, and the process continues with the next 
primer in line. This process continues until no more primers can be 

5 removed, in which case the last primer to be removed is added back to the 
set, and the next primer in line tentatively removed and so on. This can be 
viewed as a depth-first search of a tree where the nodes are combinations 
of primers, and the number of primers in each node is one less than in a 
node one level above. The root node thus contains all primers from the 

10 greedy algorithm. It has p (the number of primers after the greedy 
algorithm) primers in it. It also has p child-nodes (because there are p 
ways in which you can remove one primer from a set of p primers), each 
with p-7 primers. Each of them has p-1 children with p-2 primers and so 
on. In this way, all possible combinations of primers in the set fulfilling the 

is requirements are found, and those combinations with the same, least 
number of primers are saved as the final primer sets. 

Instead of applying greedy algorithm to the core set a 
modified algorithm called CFT may be applied. 

20 Laqranaian subgradient 

This algorithm consists of three main phases: A subgradient 

phase where a near-optimal multiplier vector is found, a heuristic phase 

where a solution to the SCP is found and column-fixing, designed to 

improve the results of the heuristic phase. 
25 In the subgradient phase, a near-optimal multiplier vector u is 

found using Equation 8. At the beginning, the starting vector u° used is 

defined as 

0 C J 
u. - mm — - — 

1 ; m 

*=1 



Equation 9 
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Later calls use the last vector u before column fixing, and 
apply a small perturbation before using it as the starting vector. The 
perturbation is randomly (and uniformly) distributed in the range ±10% for 

5 each element. The sequence of multiplier vectors is considered to have 
converged when the improvement in L(u) in the last 50 iterations is smaller 
than 0.1%, or when the number of iterations reached 10 x m. The factor X 
in Equation 8 was set to 0.1 at the beginning, and was updated as follows: 
Every 20 iterations, the best and worst lower bounds L(u) during those 20 

10 iterations are compared to each other. If the difference is larger than 1 %, 
the value of X is halved. If the difference is less than 0.1%, X is multiplied 
with 1 .5. In the first call, the upper bound, UB, used is the sum of the costs 
of the first primers that together cover all rows four times. Otherwise it is 
the value of the best solution found so far. 

15 In the heuristic phase, the last vector from the subgradient 

phase is used to generate a sequence of multiplier vectors (again using 
Equation 8), and a feasible solution constructed for each of the multiplier 
vectors. The procedure used to generate a feasible solution is a variation 
of the greedy algorithm, where each column is scored according to 

a ={ rj,Mj if 7j>0 

20 C7j [rj*tijKrjZ0 

Equation 10 



where R is the set of uncovered rows in each step. The 
column with the lowest q, i.e. the columns with the best "gain/cost"-ratio, is 
25 added in each step to the solution. This continues until no improvements to 
the best solution (i.e. minimum number of primers) have been made for 50 
iterations. 
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After the heuristic phase column fixing is applied to the 
solution. Columns that are absolutely necessary in order for a row to be 
covered (i.e. if there are only e columns covering a row and each row is to 
be covered e times) are fixed. These fixed columns are then used as a 

5 starting point for the greedy algorithm, and the first max{[200/mj t 1} 
columns chosen therein are fixed as well. 

These three phases are then applied again to the problem, 
with the condition that the fixed columns must be included in the solution 
this time. Columns already fixed in a previous round can not be removed 

10 from the solution. This goes on until either all rows are covered by the 
fixed columns, or the cost of the fixed columns is larger than the estimated 
lower bound for the entire problem or if no new columns were fixed in the 
last iteration. 

When the three phases are done, the problem is refined, in 
is order to improve the solution. Here, each column in the best solution found 
so far is scored according to 

( ) m « K — 1 

Sj =max{c ; («*),0}+ 2>*w,- Hf~ 

Equation 11 

where 

20 

Equation 12 

and S is the set of columns in the solution. The term U/(K, - 1) 
is the contribution of row / to the gap between the estimated lower and 
25 upper bound of the problem. This is then split uniformly between all 
columns in the solution covering that row. Columns with small Sj 
(contributing the least to the gap) are then likely to be part of the optimal 
solution. The p columns with the smallest Sj are then fixed before the entire 
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algorithm is applied again to the resulting sub-problem. (Column fixing 
here has nothing to do with column fixing after the heuristic phase, so 
columns fixed there need no longer be fixed here), p is the smallest value 
satisfying 



exrn 
Equation 13 



where {j k } is the set of columns in the solution ordered with 
ascending Sj, and /, is the set of rows covered by column/, n is in the range 

10 0...1 and controls the percentage number of rows removed after fixing, n = 
1 means that no rows will be uncovered, while n = 0 means that no 
columns will be fixpd before reapplying the algorithm. (Since each row has 
to be covered multiple times, in this case it is not actually the number of 
rows but the number of elements covering the rows that are regulated by 

15 n). In the beginning, 71 is set to 0.3 and is multiplied with a = 1 .1 if the best 
solution so far was not improved in the last application of the three main 
phases. If a better solution was found, 71 was reset to 0.3. Because of the 
density of the matrices, the number of columns fixed in this step was also 
set to be at least one more than in the previous iteration (if no 

20 improvements were made). Otherwise the same number of columns would 
be fixed in a number of iterations before the value of ji is large enough to 
allow more columns to be fixed. 

The algorithm is iterated until either the value of the best 
solution is less than the estimated lower bound, all columns in the best 

25 solution found so far are already fixed in the refining step or a time limit is 
exceeded. The time limit in this case was arbitrarily set to as many 
seconds as there were rows in the problem. However, the time limit is only 
checked before the refining step. If it is not exceeded, a whole iteration of 
the algorithm will be executed before another check is done. Here too a 
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check was done afterwards to see if primers could be removed without 

breaking any constraints. 

With this algorithm no pricing is performed. Pricing is used to 
update the core problem, exchanging columns between the core problem 
5 and columns outside the core. It was not included here since it was argued 
that since the costs of the columns are all the same, the best columns 
would be those with the largest number of non-zero elements. These 
would be the first columns to be added to the core, and the columns not 
included in the core would most probably not be better than those included. 
10 Also, the pricing step will require some computation which will extend the 
time required by this algorithm. As is, the computational requirement of this 
algorithm is several orders of magnitudes higher than for the greedy 
algorithm. Finally, the main memory available in the computer puts a limit 
on the how large the problems can be. If pricing was included all data will 
15 not fit into the physical memory, forcing the computer to use a swap-file 
which would increase the computation times considerably. 

Using both alternative algorithms described above a minimum 
number of primers were identified for various sequences. The results are 
set out below. 

20 u will be apparent that the initial manual rejection of primers, 

steps (12, 14 and 22) need not be performed and instead the algorithms 
can be applied to the original complete set of primers. However, the initial 
rejection of obvious failed primer candidates can significantly reduce the 
computational time required in the later stages. Similarly, in many cases 

25 the final redundancy check (30) need not be performed as in many cases 
little or no reduction in the number of primers was achieved by this final 
check. 

Furthermore, although in the method described above the 
primers were initially sorted in order of score, this need not be performed. 
30 The algorithms for stripping out redundant primers are capable of operating 
with any order of primers including a wholly random order. However, 
slightly better results were obtained when ordering by score was 
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performed. 

Collecting sequences 

The HLA-sequences were available internally from 
5 Amersham Pharmacia Biotech (release December 1 997), and included 91 
alleles from HLA-A, 202 HLA-B, 47 HLA-C. 1 1 HLA-DPA1 (coding for the 
ot-chain), 74 HLA-DPB1 (p-chain), 18 HLA-DQA1, 34 HLA-DQB1, 192 HLA- 
DR1 and 35 sequences in all of HLA-DR3, -DR4 and -DR5. The length of 
these sequences range from ~250bp to -1 100bp. 
10 The 16S rRNA-sequences were collected from GenBank 

(Benson et a/., 1998), an annotated database of all publicly available DNA 
sequences. Only a subset of all the available 16S rRNA-sequences were 
used. The sequences used were all from organisms that could be 
identified using either the MicroLog or the MicroStation system from Biolog 
15 Inc., or the API systems from Counterpart Diagnostics. These systems 
utilise differences in metabolism in order to identify the organisms, which is 
the most common way of identifying micro-organisms today. Altogether, 
1207 sequences from 523 different organisms were collected from 
GenBank. 269 of those 523 organisms had only one 16S rRNA sequence 
20 among those 1207 sequences. The length of these sequences is between 
~1000bpand~1500bp. 



Data set 


No. sequences 


Mean length of sequences 


DPA1 


11 


517 


DPB1 


74 


288 


DQA1 


17 


616 


DQB1 


34 


490 


DRB1. 


192 


324 


DRB345 


35 


400 


HLA-A 


91 


944 


HLA-B 


200 


900 


HLA-C 


47 


1003 


1 6S rRNA 


1207 


1452 



Table 1: Details about data sets. 
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The program was written using the Microsoft® Visual C++®, 
version 5.0 compiler. It was executed on a PC with a Pentium® MMX 233 
MHz processor, 64 MB RAM and Windows® 95, unless otherwise 
indicated. All execution times are for the entire program, including I/O. 

5 As can be seen in Table 2, the binary SCP matrices were 

quite dense. The density (i.e. the number of non-zero elements in the 
matrix) usually lies around a few percent, of course depending on the 
application. A higher density means that fewer columns are needed in 
order to cover all rows. This is offset in this case by the fact that all rows 

10 were required to be covered multiple times. Another consequence of this 
high density is that the number of primers needed according to the greedy 
algorithm could be much higher than in the optimal solution. (Recall that 
the worst case behaviour of the greedy algorithm is a function of the largest 
column-sum of elements.) 

Dataset DPA1 DPB1 DQA1 DQB1 DRB1 DRB345 HLA-A HLA-B HLA£ 16S rRNA 
No. rows 55 2701 136 561 18336 595 4095 19900 1081 727821 
15 I Density (%) 47.89 20.73 36.31 42.18 24.98 37.70 36.3 1 32.33 30.41 Z04] 

Table 2: Some details about the binary SCP matrix. Data are 
calculated for all primers in the primary set. 

The program could be considered as consisting of two 
phases. The first phase involves constructing all primers and finding out 
20 what kind of signal they will get for each sequence. The second phase is 
the optimisation phase, were the SCP is solved. Some details about the 
first phase can be found in Table 3. 

Dataset DPA1 DPfil DQA1 DQBJ. DRB1 DRB345 HL^A HLA£ HLA£ 16SrRNA 

First set 1747 1885 2487 2891 3891 3031 4756 4994 4293 247877 

Primary set 1333 1475 2166 2730 3651 3016 3886 4585 3354 247877 

Core set 106 321 213 244 385 203 595 750 338 2377 

Time(s) 4.67 6.81 11.26 18.51 42.29 14.56 124.74 286.82 61.29 1 50632 1 

Table 3: Number of primers in different stages of the algorithm and time to 
25 get signals for all primers. The number of primers in the core are for 

homozygotes. 
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One explanation to this high density is that the sequences in 
the data sets are quite similar to each other, so that most primers will 
hybridise to and give signal for more than one sequence (either the same 
or different signals). This is also indicated in Table 3, where for some data 
5 sets there is a noticeable drop from the number of primers in the first set to 
the number of primers in the primary set. Most of this reduction is due to a 
primer having the same signal for all sequences, which in turn means that 
all sequences have a substring that is similar enough for the primer to 
hybridise to and that the nucleotide after the primer is the same for all 
io sequences. In contrast, the 1 6S rRNA data set has a much lower density, 
and no reduction in the primers going from the first set of primers to the 
primary set. As the sequences in this data set come from organisms which 
might be only distantly related to each other, there need not be as much 
similarity between the sequences as there is in the HLA data sets. Another 
is explanation is this: If all k sequences except one give the same signal for a 
primer, that column in the binary SCP-matrix will have k-1 non-zero 
elements. The density (for that column) will then be (k-1) I (k(k-1)/2) = 2Jk. 
In other words, the density will be higher for smaller values of k, and 
smaller for larger values. This means that it would be "natural" for smaller 
20 matrices to have higher densities, and larger matrices to have lower 
densities. 

In the second phase, solving the SCP, a few different 
approaches were tried. The results, the minimum number of primers 
needed and the time required to find this number, can be found in Table 4 

25 and Table 5. Even though the worst case behaviour of the greedy 

algorithm is not so good in this application, the results are not much worse 
than when using a Lagrangian subgradient (CFT) method. The greedy 
algorithm typically needs two or three more primers, while the computation 
times are much lower for the greedy algorithm. 

30 The results show that it is worthwhile to check the results 

from the greedy algorithm for redundancy. In all cases except one primers 
could be removed and the resulting primer sets still fulfil all requirements. 
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Jhis is not true for the CFT algorithm, however, as there is only one 
instance in which the result could be improved. On the other hand, since 
there is some randomness in the CFT algorithm (an old multiplier vector is 
disturbed randomly before being used as a starting vector in the next 
5 iteration)* the results can differ from one execution of the algorithm to 
another. Sometimes the results can be improved, and sometimes not 
(results not shown). 

Dataset DPM DEB1QQMOQB1 DRB1 DRB345 HLArA HL&B HLA£ 16SrRNA 

Greedy 11 42 32 31 48 24 73 103 51 210 

Time(s) 0.27 1.37 0.61 0.71 11.5 0.66 4.61 31.36 1.15 9921.48* 

Final 11 41 30 29 44 21 72 99 47 197* 

Totalis! 0.27 1.81 0-72 0.88 30.3 0.71 6.48 85.14 1.76 >300000M 

Table 4: No. of primers after the greedy algorithm and time 
10 spent by it. Also final nr. of primers after check for redundancy and the total 
time spent solving the SCP. 'Value from a 300MHz Pentium II with 512MB 
RAM running Windows NT 4.0. The computation was halted before 
completion due to time constraints. 



Dataset 


DPA1 DPB1 


DQM 


DQB1 


DRB345 


HLA-A 


HLA-C 


CFT 


10 38 


26 


27 


20 


69 


47 


Time (s) 


10.22 2748.92 


60.80 


372.56 


427.32 


4547.33 


1091.37 


Final 


10 38 


26 


27 


20 


69 


45 


Total (s) 


10.22 2749.14 


60.86 


372.61 


427.38 


4548.49 


1111.70 



Table 5: Results using modified algorithm CFT. 



One reason CFT is not much better than the greedy algorithm 
could be that it was designed for other instances of SCP. The SCP arising 
in this application differ in three aspects from those: A) The density is much 
20 higher, B) All rows are to be covered multiple times and C) The costs of all 
columns are all the same. 

A comparison was made between the results from the greedy 
algorithm and from CFT in Table 6. Most of the primers (70% or more) 
were chosen by both algorithms, indicating that these primers are likely to 
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be part of an optimal solution. However, this is only an indication as the 
only way to prove this is to find an optimal solution. This will require far too 
much time even for the smallest data set as the problem is NP-hard. 



Dataset 


DPA1 


DPB1 


DQA1 


DQB1 


DRB345 


HLA-A 


HLA-C 


Greedy 


11 


41 


30 


29 


21 


72 


47 


CFT 


10 


38 


26 


27 


20 


69 


48 


Same 


7 


33 


22 


22 


14 


62 


38 


Percent (%) 


70.00 


86.84 


84.62 


81.48 


70.00 


89.86 


80.85 



Table 6: Comparison of primers from the two different 
algorithms. 



Results from combining HLA sequences in order to 
differentiate between heterozygous individuals can be found in Table 7. 
10 CFT was only used for the two smallest data sets due to the time re- 
quirements. It performed slightly better than the greedy algorithm on those, 
but only by one primer on each data set. There are heterozygotes that can 
not be distinguished from another heterozygote, which can be seen in 
Table 7. This happens because the combination of two sequences to form 
is one heterozygote could result in exactly the same signal pattern as another 
combination of homozygotes. In other words, some rows in the signal- 
matrix will be the same leading to some rows in the binary SCP-matrix not 
containing any non-zero elements at all. For some of those pairs listed, 
this is not true, however. They are listed because there were not enough 
20 primers that have different signals for these pairs, and so could not meet 
the requirement of at least four different signals in the signal patterns 
(Table 8). For the rest, it is simply a limitation of this technique to type 
HLA-genes. To be able to identify the alleles forming each heterozygote, 
primers that amplify alleles selectively should be used in the PCR step. 
25 This will remove the ambiguities as some heterozygotes simply will be 
transformed to homozygotes since only one of the alleles in the 
heterozygote will be amplified and not the other. 
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Dataset DPA1 DPB1 DQA1 DQB1 DRB345 HLA-A HLA-C 

Greedy 26 130 51 81 

Time(s) 0.99 9229.57 7.41 294.51 

CFT 25 - 50 - 

Time(s) 1943.82 - 8427.82 

Amb.het 0 16 2 2 

Percent (%) 0-00 0-58 1.31 0.34 

Table 7: Results from heterozygous pairs. Number of primers 
needed, the time spent, how many heterozygotes that did not differ by at 
s least four signals from any other heterozygote and the percentage of total 
number of heterozygotes. *Value from a 300MHz Pentium II with 512MB 
RAM running Windows NT 4.0. 

Unfortunately, it was not possible to obtain any results for 
10 heterozygotes for the data sets DRB1 and HLA-B, as these were too large 
to run on existing machines. A very approximate extrapolation of the 
primers needed for these data sets suggests that the total number of 
primers for all HLA sets together would be <1000, which can placed on one 
chip without problem (one chip can contain up to -5000 primers). Without 
15 the reduction obtained above, at most two genes could be tested on each 
chip. With the reduction, all nine HLA genes and the 16S rRNA gene can 
be tested on one chip, and with plenty of room to spare for other genes as 
well. This makes APEX more versatile, as it allows a family of related 
genes to be tested using only one chip instead of several. 



94 172 94 

453.19 20826.20* 1212.59 



6 19 4 

0.95 0.45 0.35 
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Paw 1 OPBTO501 OP81*2t01 
p,l,2 OP8r220t OP81-3601 
No. dlM. 2 

Palfl OPBV0501 OPBV5501 
Pair 2 OPBI'3001 OPB1*6301 
No. d Iff- 2 

Palrl OPB1*0601 DPBA-3601 
Pair 2 DPB1'200tt>PB1*2101 
No. dllf. 1 

Pair 1 DPBI'OBOI DPB1M401 
Pair 2 OPB1MO01 DPBV5701 
No.dlff. 0 

Pair 1 OPBt'0901 OPBf300t 
Pair 2 OPB1M701 DPBVS401 
No.dlff. 0 

Palrl OPBI'OBOI OP8f3«01 
Pair 2 OPBI'2101 OPB1'3501 
No. dllf- 0 

Palrl OPBI'OBOt DPB1*4501 
Pair 2 DPB1M001 OPBV1401 
No.dlff- 0 

Palrl DPBI'3901 DPBr3301 
Pair 2 DPBV4001 OPBIM901 
No.dlff. 0 



OQA1 



DQB1 



DRB345 



HLA-C 



Palrl OQAI-0101 OQAfOlO* 
Pair 2 DOAi*0101 OQA1"0105 
No. dlff. 3 

Palrl 0QB1'0604 DQB1*0612 
Pair 2 DQB1"060B OQBI»0609 
No.dlff- 2 

Palfl ORB4-01011 DRB4-01011 
Pair 2 DRB4*01011 ORB4-0301N 
NO. dlff. ° 

Palrl ORB4*01011 ORB4-0103 
Pair 2 ORB4«0103 ORB4'0301N 
No.dlff. ° 

Palrl ORB4'020INDRB4'0201N 
Pair 2 DRB4-0201NDRB4'03O1N 
NO. dlff. 0 

Palrl CW1203 CW1602 
Pair 2 CwM2042 CW1601 
No.dlff. 0 



Palrl CW12042 CW1502 
Pair 2 CW1205 C**1503 
No.dlff. 0 



Palrl A*O101 A-2411N 
Pair 2 A-0104N A*2402 
No.dlff. 0 

Pair 1 A'020t A'0205 
Pair 2 A-0202 A'0206 
No.dlM. 1 

Pair 1 A-0201 A'0205 
Pair 2" A*0214 A'0222 
No.dlff. 1 

Palrl A*0201 A'0208 
Pair 2 A'0205 A*0220 
No.dlff- 0 

Pain A*0201 A'0213 
Pair 2 A*0212 A'0226 
NO. dlff. 2 

Pair 1 A*0201 A'2406 
Pair 2 A*0222 A*2413 
No.dlff. 0 

Pair 1 A-0202 A'0206 
Pair 2 A*0214 AMJ222 
No.dHf. 0 

Palrl A-0212 A'2601 
Pair 2 A'0222 A'2608 
No. dllf. 2 

Palrl A'2402 A'2502 
Pair 2 A '2407 A*2501 
No. dlff. 0 

Palrl A'2402 A-6B012 
Pair 2 A*2407 A'68031 
No.dlff- 0 



Pair 1 A'2501 
Pair 2 A'2502 
No. dlff. 



A*680t2 
A*68031 



Table 8: Heterozygous pairs that do not differ enough in their signal 
patterns, and how many signals they differ with. 

The results of this work are summarised in the following 



Table 9 
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Class 1 


Number of 


Primers 


Class II 


Number of 


Primers 




alleles 


needed 




alleles 


needed 


HLA-A 


91 


172 


DPA1 


11 


26 


HLA-B 


200 


<1000 


DPB1 


74 


130 


HLA-C 


47 


94 


DQA1 


17 


51 








DQB1 


34 


84 








DRB1 


192 


<1000 








DRB345 


35 


94 



Table 9. Number of primers needed to discriminate between 
heterozygote HLA samples. 

5 

Some sets of primers indicated in Table 9, and also the set 
indicated for 16S rRNA, are set out in appendix 2. 

Primers can be arranged on the surface of a support in such 
a way that different studied types, genes, alleles, species etc. form easily 
10 recognised characters such as figures or letters. These character forming 
primers can be additional primers of common origin from the gene of 
interest and be used for validation of the process. 

The following demonstration is based on the HLA Class II 

DQB gene. 

15 

Experimental 

Materials 

Amplification: 

20 DNA: Four homozygote for DQB cell lines, with alleles 0402, 0301 , 0601 1 
and 0201. 

Primers: Primer DQB 9246 from Williams et al. -96 and DQB 96012 from 
Amersham Pharmacia Biotech HLA DQB typing kit, covering exon 2, 



WO 00/65088 PCT/EP00/03636 

-28- 



generating a fragment of 300 base pairs. 

Amplification reagents: PCR mix from the Amersham Pharmacia Biotech 
HLA DQB typing kit, a prototype kit. 

All amplifications were spiked with dUTP, to get a final concentration of 100 
5 or 200 mM dUTP. 

Enzymes for fragmentation of PCR products: 
Shrimp alkaline phosphatase (SAP)1 U/ul APB. 
Uracil-DNA-glycosylase, (if from PE UDG = UNG) 1 U/pl NE Biolabs. 

SAP will degrade (dephosphorylate) all free dNTPs and UDG 
will remove all dU from the DNA and after heating the strands will be 
broken at these points. This step is applicable to any DNA fragment. 

15 Primers for spotting: 

All 84 primers for the 500 bp fragment were ordered from 
LTI/GIBCO BRL Custom primers sen/ice. All were 25-mers with an amino- 
activated 5' -end. For primer sequences see appendix 1 . Self extended 
primers were N, A, C, G and T as controls with the following sequences: 

20 N: amino TTT AGC CTT AAC GCC T N TGAC GTCA 

A, C.G. T: amino TTT AGC CTT AAC GCC T X TGAC GTCA, where X is 
A,C,GorT. 

Extension reagents for the APEX reaction 

Dyes: Specially synthesised for Baylor by Du Pont and /or APB 

Cy2 - ddCTP (equal to fluorescein) 50 \iM 
Cy3-ddATP 50 H M 

Texas Red -ddGTP 50 nM 

Cy5 - ddUTP (often written as T in many of the reactions and 

results) 50 ^ M 

10x ThermoSequenase™ DNA polymerase buffer (TS): 



25 



30 
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260 mM Tris-HCI pH 9.5; 65 mM MgCI 2 , ThermoSequenase DNA 
polymerase (Amersham Pharmacia Biotech) 4 U/pl, if needed dilute with 
T.S. dilution buffer (=10 mM Tris-HCI pH 8.0; 1 mM p-mercaptoethanol, 
0.5% Tween - 20(v/v). 0.5% Nonidet P-40 (v/v). TS was used from a 150 
5 unit stock and diluted 1 nl + 37 nl dilution buffer. 

Methods 

Preparation of glass slides before spotting of primer: 

Arrange 25-30 cover slips (24 x 60 mm) in a stainless staining 

10 tray. 

Immerse the tray in glass staining dish with acetone to fully 

immerse slides. 

Place the glass staining dish in sonicatorfor 10 minutes. 
Remove the tray from acetone bath, shake of excess of 
is acetone and rinse several times (at least twice) in MilliQ water. 

Immerse tray in 100 mM NaOH and sonicate for 10 minutes 

(a few more minutes, no problem). 

Remove the tray and shake of excess of NaOH and rinse 
several times (at least twice) in MilliQ water. 

Immerse tray in silane solution and sonicate for 2 minutes. 
Wash slides by immersion in 100% EtOH once. 
Dry the tray with the slides using nitrogen with a high velocity 

(without breaking the slides). 

Cure the slides in a vacuum oven at 100°C over night or until 
25 they are used for spotting (at least 20 minutes vacuum is needed). 

S potting of oliqos: 

All spotting was done with a spotter with 96 parallel capacity. 
Each slide was spotted with three replicas of the primers. 
30 After spotting the slides were allowed to air dry for 5 to 1 5 

minutes, when dried they were marked. They were stored at room 
temperature, in a dry place, in the trays until used. 



20 



WO 00/65088 



-30- 



PCT/EP00/03636 



10 



noR am plification 

The DQB amplification was done according to the method 
described by Williams et a/. -96 using a 33% dUTP mix. After 40 cycles 
(95°C. 30 sec; 55°C, 30 sec; 72°C. 30 sec), one microliter of the PCR 
products was tested on a 1 .5% agarose gel, before the fragmentation step. 

Williams, Bassinger, Moehlenkamp, Wu, Montoya, Griffith. 
McAuley, Goldman, Maurer: Strategy for distinguishing a new DQB1 allele 
(DQB1*061 1) from the closely related DQB1*0602 allele Tissue Antigens, 
1996,48:143-147. 

Fragmentatio n nf PCR products: 

Before APEX can be done all DNA fragments must be 
fragmented so all new fragments can get access to the primer on the chip. 



15 Set up: 

5 nl DNA from a PCR reaction (1/10 of the PCR reaction) 
2 \i\ SAP (Shrimp alkaline phosphatase) 1U/ul APB 
1 nl UDG (Uracil-DNA-glycosylase) 1U/jil NE Biolabs 
15 |il water 
20 Total: 23 uJ 

Incubate 37°C for 2 hour. 

The samples were frozen and stored until they were used. 

Inactivation of enzymes at 100°C for 10 minutes can be done, 
but not needed since this is the first step in the APEX reaction. 

25 

Extension method for the APEX reaction 

Slide treatment: 

Start with washing the slides in hot water (90 - 98°C, not 
30 boiling) for 2 x 5 minutes in a 50 ml Flacon tube. When the slides are 

ready, remove them from the tube with a forceps and place them on a dry 



) 
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heater block at 48°C. The slide(=DNA chip) is now ready for adding the 
reactions. 

APFX reactions set up: 

5 

23 ul DNA from the fragmentation step. 

3 ul 10x TS reaction buffer (the rest of the buffer comes from PCR and 
UDG cleavage) 
17 ul for cover slip method. 
10 Heat denature at 100°C for 7-10 minutes, target 8 minutes, not longer. 
Spin the tube quickly and add quickly 
1 ul ThermoSequenase DNA polymerase (4U) 
1 ul Dye-mix (50 uM of the four dideoxynucleotides A, C, G, and T, 

separately dye labelled). 
15 Then the reaction mix was physically spread out over the 

primer array with the tip of a pipette tip. Incubate at 48*C until no trace of 
solution is seen. This takes about 8 minutes. 

Wash with hot water for 2 - 5 minutes, 2 times. Ready to 

read on detection instrument. 

20 

Detection 

The detection system is a total internal reflection fluorescence 
(TIRF) system, where microscopic slides are placed on top of a prism with 
oil on to link a laser beam in to the glass slide. The system has light of five 
25 different wave lengths from five different lasers to vary between. In this 
experiment only four were used. To detect Cy2 a laser with 488 nm was 
used, for Cy3 a 532 nm, for Cy5 a 635 nm and for Texas Red a 670 nm 
laser were used. Image related software were based on Image Pro Plus 
3.0. 

30 
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Results 

Amplification of HLA DQB alleles 

The DNA from the four DQB homozygote cell lines were 
5 amplified according to the protocol in Williams et al. -96 with two different 
concentrations of dUTP. In addition to this, DNA from six different 
heterozygotes were amplified. All amplifications worked well and the 
expected 300 bp fragment were seen from all samples. 

io APEX reaction with DQB chip 

Primer chips were washed and fragmented PCR products 
were incubated on the chip according to the protocol. The image was 
compared to the expected pattern. The expected pattern was similar to but 
somewhat different from the recorded pattern, the reason for this is that the 

15 set up was planned for a 500 bp fragment, but the actual fragment used 
was a 300 bp PCR fragment. 

Homozygous cell lines results 

Figure 4 shows the results from a cell line homozygous for 
20 the DQB 0204 allele. The pattern shown in the image is very close or 
similar to the expected results from exon 2. 

In all reaction the control primers worked well and the four 
dyes were used in the same frequencies. In the case with a 500 bp 
fragment for DQB typing the primers for allele 0402 were placed in such a 
25 way that they formed figures. In Figure 4. panel D, most signals are seen 
forming a "2" from the 300 bp fragment, and the missing signal will be seen 
when the large PCR fragment is used. This clearly shows that primers can 
be placed in a clever way to form figures. 

30 Heterozygous results 

For the heterozygous test only one of the four dye reactions 
worked. Some of the expected spots from the heterozygous sample were 
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not seen, but this is probably due to the fact that no control signals were 
seen in the lower right hand corner, where the signals were weaker then in 

other part of the slide. 

As this experiment shows, a limited number of primers can be 
used for HLA typing and if they are placed in a clever way the interpretation 
of the results is very simple. Both homozygous and heterozygous samples 
can be correctly analysed with this method. 

Continuation 

An algorithm was developed in order to select the minimum 
number of primers needed to identify different genes using APEX. It was 
applied to the following HLA genes: HLA-A, HLA-B. HLA-C, HLA-DPA1 . 
HLA-DPB1 , HLA-DQA1 , HLA-DQB1 , HLA-DRB1 and HLA-DRB345. It was 
also applied to the 16S rRNA gene. In the case of HLA-DQB1 . the primers 
have been shown to work as intended. As is, a few assumptions were 
made (such as how many mismatches to be allowed between the pnmers 
and the sample DNA) that need to be tested and possibly refined. 

Another improvement that can be made is the following: As is. 
the program works only with discrete signals, e.g. either there is a signal TV 
) or there is not. either there is a signal 'G' or there is not and so on. A more 
precise approach would be to predict how strong the signals will be for 
each primer on each sequence. A rough estimate of the signal strength 
should be possible given some thermodynamic data about the pnmers. 
most notably their melting points. With this information, and knowing the 
, 5 concentration of DNA in the sample among other things, the proport.on of 
primers on the chip that will actually react with the sample DNA should be 
possible to estimate. It would thus allow a rough estimation of what 
strength the different signals will have. It will not be very precise, and the 
estimate might possibly be off by a factor 2 or more, but it will still g.ve 
30 some information about what signals to expect from the chip. 

Given the melting points of the primers, the temperature at 
which the reaction on the chip is carried out could be optimised as well. 
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Since the sequences are known, it is possible to estimate the melting point 
of any primer to any sequence when there are a few mismatches. Th>s 
could be done for all primers on all sequences, and a range of 
temperatures calculated. The actual temperature to use could then be 
5 chosen so as to be as optima, for as many primers on as many sequences 
as possible, instead of as now at a standard temperature. 

Another possibility would be to try other heuristics to solve the 
resulting SCP. Even though CFT does give better results than the greedy 
algorithm, it is not by much. It could be that Lagrangian relaxation methods 
,o really are not suitable for unicost problems, but the only way to find out « to 
try heuristics based on other ideas. It might be possible to reduce the 
binary SCP-matrix as well, before applying any heuristic on it. Some rows 
in the matrix could end up the same, in which case one of them could be 
removed in order to reduce the number of rows and thus speed up 
15 computation. No figures of how many rows might be the same exist, but ,t 
could be worthwhile examining this possibility to reduce problem s,ze. 

The algorithm itself could be improved. The complexity of the 
redundancy-check phase can be slightly reduced by having a vector 
consisting of the sums of the rows in each node. For each child-node, the 
20 column to be removed is then subtracted from this vector of sums. Th.s 
operation can be carried out in O(m), and the final complexity will then be 
0(m x N(p, p)) instead. For the greedy algorithm, another poss.ble 
improvement is to check the primer set for redundancy each time a pnmer 
was added. The complexity for the greedy algorithm will be the same, as 
25 the check will take 0(m xp) (i.e. same as each iteration in the greedy 
algorithm) each time (with the improvement just mentioned). The check 
could take longer, but that is unlikely as that would imply that one pnmer 
could make several other primers redundant. The main advantage .s, of 
course, that no redundancy check with its rather high complexity is needed 
30 afterwards. 

The most serious problem is the sheer size of the problems. 
For the 16S rRNA data set, around 300 MB is required just in order to store 
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all the primers and their signals. Add to that the fact the all primers need to 
be traversed once for every iteration in the greedy algorithm, and the result 
is that it will take quite some time as well. This also means that it is not 
even feasible to use more elaborate algorithms such as the CFT algorithm 
5 on the 16S rRNA data set, unless a much more powerful computer is 

available. On the other hand, algorithm CFT would probably benefit quite a 
lot from a parallel computer, since much computation could be carried out 
as vector-operations. It should then be possible to spread out all 
computations on several processors, thus reducing the time required. It 
io would also reduce the memory requirements on each processor (but then 
parallel computers tend to have enough memory to store all necessary data 
for this problem on each processor anyway). Even the greedy algorithm 
would benefit from a parallel computer, as each processor can be charged 
with the task of scoring only a subset of primers. It is not as critical in th.s 
is case, though, since the computation times are not very high when us.ng the 

greedy algorithm. 

As is. this method is only capable of identifying known gene- 
variants. If applied to a sample with a previously unknown variant, it is very 
probable that this new variant will be falsely identified as one of the known 
20 variants. It would be very advantageous if this method could be 

augmented in some way to recognise this fact, and give a warning if there 
could be an unknown variant in the sample. It could be done by giving a 
warning when the signal pattern gained differs from the signal pattern from 
any known variants, but this might not be enough. There is no guarantee 
25 that the new variant could not differ in some place not affecting any of the 
existing primers, which would lead to the new variant being 
indistinguishable from any of the known variants. Some other way is 
probably needed as well. 
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APPEND1X 1 

Primer sequences for DBQ heterozygote typing 
Primers 'dqbl -V to 'dqt>1 -8' placed in positions A3-A10. 
Primers "dqbl -9" to 'dqbl -18' placed in positions B2-B1 1 . 
5 Primers 'dqbl -19' to 'dqbl -30" placed in positions C1 -C12. 
Primers 'dqbl -31' to 'dqbl -42' placed in positions D1-D12. 
Primers "dqbl -43' to 'dqbl -54' placed in positions E1-E12. 

Primers 'dqbl -55' to 'dqbl -66' placed in positions F1-F12. 
10 Primers 'dqbl -67' to *dqb1 -76' placed in positions G2-G1 1 . 
Primers 'dqbl -77* to 'dqbl -84' placed in positions H3-H10. 

dqb1-1 NH2 - TCC ATC ACA GGA GTC AGA AAG GGC T 
dqb1-2 NH2 - GTG TGC AGA CAC AAC TAC GAG GTG G 
1 5 dqbl -3 NH2 - GCG GTG ACG CTG CTG GGG CTG CCT G 
dqb1-4 NH2 - TAA TGA GGG GGG TGG ACA CAA CGC C 
dqb1-5 NH2 - GCG GTG ACG CCG CTG GGG CCG CCT G 
dqb1-6 NH2 - GGA CAT CCT GGA GGA GGA CCG GGC G 
dqb1-7 NH2 - GTG GTG ACG CCG CTG GGG CCG CCT G 
20 dqbll -8 NH2 - TCC GTC AAA GGA GTC AGA AAG GGC T 
dqbl -9 NH2 - GAT GTA TCT GGT CAC ACC CCG CAC G 
dqb1-10 NH2 - CCG AGT ACT GGA ATA GCC AGA AGG A 
dqb1-1 1 NH2 - GAT GTG TCT GGT CAC ACC CCG CAC G 
dqb1-12 NH2 - GGG TGG ACA CAA CGC CGG CTG TCT C 
25 dqbM 3 NH2 - GGG TGG ACA CAA CGC CGG TTG TCT C 
dqb1-14 NH2 - CTT CTG GCT ATT CCA GTA CTC GGC G 
dqb1-15 NH2 - TTC CGG GCG GTG ACG CTG CTG GGG C 
dqbl -1 6 NH2 - GCT TCG ACA GCG ACG TGG GGG TGT A 
dqb1-17 NH2 - GCT GTT CCA GTA CTC GGC GCT AGG C 
30 dqb1-18 NH2 - CTT CTG GCT GTT CCA GTA CTC GGC G 
dqb1-19 NH2 - ACC GTG TCC AAC TCC GCC CGG GTC C 
dqbl -20 NH2 - CAC AAC GCC GGT TGT CTC CTC CTG G 
dqb1-21 NH2 - CTC CTC CTG GTC ATT CCG AAA CCA C 
dqbl -22 NH2 - CCA GGA TCT GGA AAG TCC AGT CAC C 
35 dqb1-23 NH2 - GAG CGC GTG CGT CTT GTA ACC AGA T 
dqbl -24 NH2 - GAC ATC CTG GAG AGG AAA CGG GCG G 
dqbl -25 NH2 - AGA GAC TCT CCC GAG GAT TTC GTG T 
dqbl -26 NH2 - TAG TTG TGT CTG CAC ACC CTG TCC A 
dqbl -27 NH2 - ACG TAC TCC TCT CGG TTA TAG ATG T 
40 dqb1-28 NH2 - GCT TCG ACA GCG ACG TGG AGG TGT A 
dqb1-29 NH2 • TCC GTC CCA TTG GTG AAG TAG CAC A 
dqbl -30 NH2 - TGA TAA GGC CCA GCC CGA GGA AGA T 
dqb1-31 NH2 - GGG TGG ACA CAA CGC CAG TTG TCT C 
dqb1-32 NH2 - GGG TGG ACA CAA CGC CAG CTG TCT C 
45 dqbl -33 NH2 - GAC AGC GAC GTG GAG GTG TAC CGG G 
dqb1-34 NH2 - TCC GTC CCG TTG GTG AAG TAG CAC A 
dqb1-35 NH2 - GCA CGA CCT TGC AGC GGC GAC CCC A 
dqb1-36 NH2 - GAA CAG CCA GAA GGA AGT CCT GGA G 
dqb1-37 NH2 - CTT CTG GCT GTT CCA GTA CTC GGC A 
50 dqbl -38 NH2 - AAC GCC AGC TGT CTC TTC CTG GTC A 
dqb1-39 NH2 - GAG AGG ACC CGG GCG GAG TTG GAC A 
dqbl-40 NH2 - GCA GGC GGC CCC AGC GGC GTC ACC A 
dqbl-41 NH2 - GTC GCT GTC GAA GCG CAC GTC CTC C 
dqbl-42 NH2 - CTC TGT CCT GGA TGG GGT CGC CGC T 
55 dqbl-43 NH2 - ACG GGA CGG AGC GCG TGC GTT ATG T 
dqbl -44 NH2 - GAA GTA GCA CAT GCC CTT AAA CTG G 
dqbl -45 NH2 - TCG GTG GAC ACC GTA TGC AGA CAC A 
dqbl -46 NH2 - GGA M CGT GTA CCA GTT TAA GGG C 
dqbl -47 NH2 - ACG TAC TCT TCT CGG TTA TAG ATG T 



PCT/EP00/03636 

WO 00/65088 

-37- 



d5b1-54 NH2 - TTA AGG CCA TGT GCT ACT "TCA CCA A 
dqb1-55 NH2 - TTC AGA TTG AGC CCG CCA CTC CAC O 
dqb1-56 NH2 - ATC TGG TCA CAA GACGCA CGC GCT C 
,0 dqbl -57 NH2 - AGT AGC GGC CCTJAA ACT GGT A 
dqbl -58 NH2 - ATG TAT CTG GTC AC A CCC CGC ACQ A 
dqb1-59 NH2 - ATC TGG TCA CAT AAC GCA XGCGCT C 
dqb1-60 NH2 - ATC AAA GTC . CAG > TGG M CGG AAT _G 

„ dabl-62 NH2 * ATC AA^ G^TC CGG^TGG M^GGAAT 
15 dqbl -63 NH2 - GTA TCT GGT CAC ACC CCG ^^^^G^^G C 
dob1-64 NH2 - CGC TGT CGA AGC ; GCACOT CCT CCT C 

JomS HhI GM GTA GCA CAG GCC CTT AAA CTG G 

dqb1-71 NH2 - TCG ACA GCG ACG TGG GGG TGT ACC O 
25 dqb1-72 NH2 - TCG ACA GCG ACG TGG GGG AGT TCC G 

dqb1-76 NH2 - GCG TTG GAG GCT TCG TGC ; TGG i GGCT 
10 dab1-77 NH2 - CGG TGA CCC CGC AGG GGC CGC CTG A 
30 3$ -78 S2 - ATG GGA CGG AGC GCG TGC GTT ATG T 
dqb1-79 NH2 - CGG TGA CGC CGC TGG GGC GGC TTG A 
da>1-80 NH2 - ACG <&™™?^JF££J£t 

35 SStS KSI - CGT TGT CGA AGC GCA CGT CCT £ 
dqb1-84 NH2 - GAC TCT CCC GAG GAT TTC GTG TAC C 



40 APPENDIX 2 
Homozvaotes 



(From CFT if available, otherwise greedy algorithm). 



DPA1 

45 TTTTTTTTTTTTGCCCAGGGCACAG 
T I I I 1 1 1 1 1 I I I AAGGAAAAGGCTC 
II I IIIIIIIMI GGATCTGGACAA 
TTTTTTTTTTTTCTGGCCCAGCTCC 
1I IIIIUII I II IGTACAGACCCA 
50 1 I 1 1 1 1 1 1 1 M rAGGGGACCCTGTG 
| I I I I 1 1 1 1 I I I GGCGGACCATGTG 
TTTTTTTTTTTTCTGCTCATCTTCA 
| | | I II II I I rTGTCAACTTATGCC 
I I H I | | | I I I I I CAGGCCGCCAAT 



WO 00/65088 



PCT/EP00/03636 



-38- 



DPB1 

H 1 1 1 1 I | I I I I CAACCGGGAGGAG 
1 1 1 1 1 I N I I I I GGCCTGACGAGGA 
I I I I I I I 1 1 1 I I CAACCTGGAGGAG 
5 | | 1 I I 1 1 Ml I I I CCAGTACTCCTC 
I I I I I II I I I I I I GCCGTAACTGGT 
1 1 1 1 1 1 1 1 1 II I I 'GGGGCGGCCTGA 
I I I | I I I I I I I I GCGCGTACTCCTC 
| 1 1 I I I I I I I I I I GGACAGGAGGAA 
10 1 1 1 1 I I I t I I I I CACAGGAGGAGCA 
IIIIII1IIII I U GCTCCTCCTGT 
nTTTTTTTn I GGCAATGCCCGCT 
I 1 1 I I I I I I I I I GGCACTGCCCGCT 
H II II 1 1 I II I AGAGAATTACGTG 
15 1 I I I I I I I I H I I CCAGAGAATTAC 
I I 1 1 1 I 1 1 I 1 I I A ACTACGAGCTGG 
IIIIIH I IIII GGTCATGGGCCCG 
1 1 I 1 1 I I I I I I I I GACCCTGCAGCG 
TTTTTTTTTTTTTACACGTAATTCT 
20 1 1 1 1 1 1 1 1 I I I I GTAACTGGTACAC 
1 1 1 1 II 1 1 1 I I I CTGACGAGGAGTA 

I || | I I I I I I I I I I ACCTTTTCCAG 

I I I I I I I I II I I CCTGGAAAAGGTA 

I I | I I I I I I I I I GAGAATTACCTTT 
25 I II I n 1 1 1 I I I GCCTGACGAGGAG 

1 1 1 1 1 1 1 1 1 1 1 I A CTGGTGCACGTA 

I I I I I I I I I I I I r CCTCCAGGATGT 

1 1 H 1 1 1 1 1 I I I CGGGAGGAGCTCG 
1 1 1 1 1 1 1 I I II I AGCCAGAAGGACA 

30 1 1 1 1 1 1 1 1 1 1 1 I CAGCCAGAAGGAC 
1 1 1 1 1 1 1 I I I I I AGTGCCGGACAGG 
T l I I 1 1 1 I I I I' TATTGCCGG ACAGG 
IIIIMIIIIII CCTGCAGCGCCGA 
1 1 1 1 1 1 1 I I I I I AGAGAATTACCTT 

35 IIIIIII II III GGACTCGGCGCTG 

I I I I I I I I I I I I ACTACGAGCTGGG 
1 1 1 1 1 1 1 I I I I I GCTTCGTGCTGGG 
1 1 1 1 1 I I I I I I I GTCCCTGGTACAC 

I I 1 1 1 1 1 II I I I GCGCTGCAGGGTC 

40 DQA1 

I I I I M I I I I I I ACATCCTCATCTG 

I I 1 1 1 1 1 1 1 I I IACACCCTCATCTG 
I I | | I I I I I I I I CAAGTTTACACCA 
| I I II I I I I I II CAGCCACAATGTC 

45 I I I I II I I I I I I I CCAAGTCTCCCG 
I I I 1 1 1 1 1 1 I I I CGGGAGACTTGGA 

I I I H I I | I I I I AATTCATGGCTGT 
TTTTTTTTTTTTACAATCCCAGGGC 
T I I I 1 1 I I I I I I ACAACCCCAGGGC 

50 I I I I I I I I I I I I GTGGGCATTGTGG 
TTTTTTTTTTTTCCAACACCCTCAT 
I I I I I I I I I II I 'GGCCCACAGACAA 
I I I I I I I II I I I CATGGGCATTGTG 
I I I I I I I I I I TTGGCCTGGATGAGC 

55 I 1 1 1 1 1 I I I I I I AGGCTCATCCAGG 
I I I I I I I I I I I I CAACACCCTCATT 
I I I I I I I I I I I r AGCACTGGGGACT 
n I I I I I I I I I I AAGGGCCATTGTG 
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TTTTTTTTTTTTAAATTCATGGGTG 
1 | u | | 11 1 I 1 ) CACCATAAGAGGC 
TTTTTTTTTTTTCACCACAAGAGGC 
I I I 1 I I I 1 1 1 I 1 CACCGTAAGAGGC 
5 II I II I I I I 1 I 1 1 I CCTCCCTTCTG 
MINIMUM I AACTCTCCTCAG 
I i i | | | 1 | 1 n I I A AATCTCATC AG 
TTTTTTTTTTTTCTCCTCCCTTCTG 

DQB1 

10 iTTTTTTTTTi 1 ATCTTGCAGAGGA 
TTTTTTTTTTTTCCTCTGCAGGATG 
TTTTTTTTTTTTGGGTCACCGCCCG 
TTTTTTTTTTTTGGGAGTTCCGGGC 
MINIMUM CGCTCGGGTCCTC 
15 | M I I I M I I I I CCAGTACTCGGCG 
11IIIUIII1 r CTGGGGCCGCCTG 
TTTTTTTTTTTTATGTCTACACCTG 
TTTTTTTTTTTTAAAG G G CTTCTGC 
U | M I I I I I I I AGCATCACCAGGA 
20 11 I II I I I I n I GCCAGGAGGAGAC 
p| I M I I I I II I ACCAGGAGGAGAC 
M M I I I M M I GGTTTCGGAATGA 
M I I 1 1 I I M I I GGGTGTATCGGGT 
1 1 I M M I I M I GTCGGAAAGGGCT 
25 | M 11 M M II I T GGTTTCGGAATG 
M M 11 I I I M I CCAGTACTCGGCA 
M I I I I M I M I A GCGCACGATCTC 
M M II II I M I GTCTCTTCCTGGT 
M M 1 I M I M I CGTCAAGCCGCCC 
30 M 1 1 I I I M M I GCGTCAAGCCGCC 
M I I I M 1 I M 1 CAAGGTCGTGCGG 
M I I n I I I II I CGGTTATAGATGT 
n I I I I 1 I I 1 1 I I GTAACCAGACAC 
ITTTTTTTTTi 1 GTATGCAGACACA 
35 M II M M M II GACACCCCGCACG 

I 1 I I I I I I I I 1 1 ACACCCCGCACGC 

DRB1 

I I M I M 1 1 M I GCAAGTCCTCCTC 
M I II I HMIIM CTCCTCCCGGT 

40 M I I M I I I I T TCCACAACCCGGTA 
I I I I n I I I I I I GGCCAGGTGGACA 
I I M I I M I M I GCGGTTCCTGGAG 
M II I I I I n I I CAGCCAGAAGGAC 
U I I I I I I I I I I GACTCGCCTCTGC 
45 II I I I I I I I I I I I CCAGGACTCGGC 
ITTTTTTTTTI I GAAATAACACTCA 
M I I I I I I I I I I I GGAGGACAGGCG 
M I I I II I I I 1 1 A CGTGGTCGGGTG 
M M I M II I M I ACTCCAAGAAAC 

50 I I M II M II I I ACGGTGTCCACCT 
I I M M M I M I GGAGAGGTTTACA 
M M II M I I I I CCAGTACTCGGCA 
IIIIIIIHII I GGAGTACTCTACG 
I M I M II M I I GTGTAAACCTCTC 

55 M I I M M I I M CGGTGCAGCGGCG 
I M M I I M II I GGAGGAGTTCCTG 
M M II I M I II l GGAAGACGAGCG 
T M I I I II M I I CAGGAGGTTGTGG 
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1 | il 11 I n 1 I 1 GACAGGCGCGCCG 
U 1 | | | 1 n I I I CCGTTCAGGAACC 
H | M H I 1 11 1 GGAATCCTCTTGG 
M I I 1 I 1 I i I I 1 GCCACAAGAAACG 

5 rTTTTTTTm I ACGTTTCTTGG AG 
TTTTTTTTTTTTCGGACTCCTCTTG 
TTTTTTTTTTTTTACGGGTGAGTGT 
1 I I I I H I 1 I 1 I CCAGGAGGAGTTC 
TTTTTTTTTTTTGTAATTGTCCACC 

10 TTTTTTTTTTTTTCGTAGCGCGCGT 
M 1 1 | I 11 1 1 1 1 AAGATGCATCTAT 
" MIN I MUM [ ACGTCTGAGTGT 

minimi i ccagtactcagca 

TTTTTTTTTTTTCGTAGCGCGCGTA 
15 TTTTTTTTTTTTATCTCTCCACAAC 
rTTTTTTTTTI 1 GAGCTCCTCCTGG 
TTTTTTTTTTTTAACCAGGAGGAGT 
M I II M i I M I AGGGCCCGCCTGT 
1 1 1 11 M 1 1 11 1 GGAGAGCTTCACA 
20 1 1 1 Ml M 1 M I GGAGAGATTCACA 
M M I II I 1 I II I CACCGCCCGGTA 
M M I II M M 1 A ACTACCGGGTTG 
rTTTTTTTTTI I CCAGTACTGGGCA 

DRB345 

25 | M II I 1 I I I 1 1 GTATCTGTCCAGG 
M II I I I I I I I I GACTGGGGTGGTG 
M I I II I 1 1 1 1 I CTGTCGAAGCGCA 
TTTTTTTTTTTTGTGTAAACCTCTC 
M 1 1 M I I 1 1 1 I CTGTGAAGCTCTC 
30 M I I M I II 1 1 I CACCAGGGCCCGC 
M II II I I 1 1 1 I GGCCAGGTGGACA 
rTTTTTTTTTI 1 GCGGTTCCTGGAG 
TTTTTTTTTTTTTCGAAGCGCGCGT 
M 1 I I I 1 I n I 1 1 A ACCAGGAGGAG 

35 I 1 I I I M II I 1 1 ACGTGGTCGGGTG 
1 1 M 1 II I 1 I I I AGGGCCCGCCTGT 
M M I M 1 1 1 1 I GGGCCCGCCTGTC 
I I M I I 11 1 I 1 1 AACTACGGAGTTG 
M 1 I 1 1 I 1 1 1 1 I GGGGCCGGGCTGT 

40 11 n II II I II 1 GACCATGTTTCTT 

I 11 II I II I M V CTGTGCAGGAACC 
TTTTTTTTTTTTGGCCGGGCTGTTC 

I I I M 1 I I I I 1 1 A CATCCTGGAAGA 
M I IMM II H CTCACGAGTCCTG 

45 HLA-A 

1 1 I I I I II M I I I CAGTCTGTGAGT 
M I M 1 11 1 I I 1 AGACGCATATGAC 
M 1 11 M 1 1 I I 1 GGACGCATATGAC 
1 | M M 1 M I 1 1 GGTCGCCAGGTCC 

50 1 1 1 I II U I 1 1 I CCGCAGGCTCTCT 
M I I I I M II I I I CCTCCTCCACAT 
M 1 1 1 1 M II 1 I CCGAACCCTCGTC 
M M 1 l M M I I ATTTCTCCACATC 
TTTTTTTTTTTTGGCGGACATGGCG 

55 I I I 11 1 I I M M CCAGAGCGAGGAC 
| M M M M 11 M 1 CACCACATCCG 
M I 1 M M I II I GGGAGCCTGCCCA 
M 1 M M M M I I GATGTGGAGGAG 
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TTTTTTTTn IT GGAGGAGGAACAG 
1 1 1 1 1 1 I 1 1 1 1 1 AGTCATATGCGTC 
TTTTTTTTTTTTGGTCTGCCCGAGC 
TTTTTTTTTTTTAAACCTGCCATGT 
5 TTTTTTTTTTTTCCGGGACACGGAA 
1 1 1 1 1 1 1 1 1 1 1 1 CGTCCTGGGGGGG 
TTTTTTTTTTTTCCGCTGCCAGGTC 
I I I I HI I I I I I ATGCGTCCTGGGG 
U | ti | | | | | | IA TGCGTCTTGGGG 
10 I II II 1 1 1 1 H 1 GGAGAAG AGATAC 
HHI I MIIII GGGAGCCCGCCCA 

I | | I I I I I M I I CCGCAGGTTCTCT 
nilHHII I I GCGCAGGTCCTCT 
H | || | | | n I I GGGCGGGCTCTCA 

15 1 1 1 1 1 1 1 1 1 1 1 1 CCAGGACACGGAG 

I I II I 1 I I I I I I CCGGCAGTGGAGA 
| | | | I I I I I I I I AGGAGACAGGGAA 

I I I 1 I I I I I 1 1 T GTCAATCTGTGAG 

I I 1 1 I I I 1 1 1 1 I AGAAGTGGGTGGC 
20 IIIIIIIHIH CAGGTAGGCTCTC 

Hllllllim CGGACGCCCCCAA 
IHIHIIIIIII CAATCTGTGAGT 
I IIIIIIMIHI GAAGGCCCAGTC 
1 1 1 1 1 1 1 1 1 1 1 I 'CGTCGTAAGCGTC 

25 I I 1 1 I I I 1 1 I I I A ACCAGAGCGAGG 
1 1 1 1 1 1 1 1 1 1 1 1 I GACGGTCATGGC 
Hllinilllll GGACCTGGCGAC 
1 1 I 1 1 1 1 1 1 I I r G AGAGCCCGCCCA 
IHIIHIIIIH CATATTCCGTGT 

30 1 1 1 1 I 1 1 1 I I I I GGGAGACACGGAA 
1 1 1 1 1 1 1 1 I 1 1 I GTCCACTCGGTCA 
n II I I I 1 1 1 1 I CCGTGTCTCCCCG 
HIIIHIIII IGCTGCCACGTGGG 
1IHIHIIIII CGAACTGCGTGTC 

35 1 1 | I I I I I I 1 I I GGTAGGCTCTCAA 

I I 1 1 1 I 1 1 I I I I 'AGGTCCACTCGGT 

I I I I I 1 1 M 1 1 I GTCCTGGGGGGGT 
1 1 1 1 I I 1 1 1 1 1 I GCTGCTCCGCCGC 
1 1 1 II I 1 1 1 1 1 1 GGGGCGCCATGAC 

40 1 1 1 1 1 1 1 1 1 U I GCGCGATCCGCAG 
HUIHIIIII GCACATGGCAGGT 
| I 1 1 I I I I I I I I A GGAGAAGAGATA 
1 1 1 1 1 1 1 1 1 1 1 TAGGAGCAGAGATA 
H 1 1 1 1 1 1 1 1 1 l -CCACTCCACGCAC 
45 1 1 1 1 1 1 1 1 1 I r TCCCGTCCACGCAC 
TTTTTTTTTTTTCACGTGCCATCCA 
HIHIHHII CCCGGCCCGGCAG 
I HHIIIIIII CACGTCGCAGCCA 
milUIIIII ACGTCGCAGCCAT 
50 n || | | 1 1 I I H ACGTGGCAGCCAT 
HIHIIIIIIIA TCCAGAGGATGT 
UI I IMIII I I CGAGCTCCGTGTC 
TTTTTTTTTTTTACC AGAGCGAG G A 
| I I I I I I I I I I I A TGAACAGCACGC 
55 1 1 I I I I I I I I I I r CACACCCTCCAG 
I I | I I I I I I I I I CTACGTGGACAAC 
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HLA-B 

1 1 1 I 1 1 I 1 1 1 1 1 GGATGGCGCCCCG 

I I | I I I I I I I I I CGGCTCAGATCTC 

1 1 1 1 1 1 1 1 1 1 1 1 1 CGGGGCGCCGTG 
5 IIMIIIIIIII CTCCACTGCTCCG 
1 1 1 I 1 1 1 1 1 1 1 1 1 GTGTTGGTCTTG 

I I | | I I 1 I II 11 GGGTATGACCAGT 
1 1 I I I I I I I I I I I CCAGGTGATGTA 
1 1 1 1 1 1 1 1 1 1 1 I GTCCTGCTCCGCC 

10 1 1 1 H 1 1 1 1 1 1 1 1 GTAGTAGCGGAG 
IIIIII I IIIII GCTCAGGTCCTCC 
I | | M I I I I 1 1 I ACCAACACACAGA 
I I I I I I I I I I I I CCGTCGTAGGCGT 
I HI 1 1 1 1 1 1 1 1 GTGAGCCTGCGGA 

15 1 1 1 1 I I 1 1 1 1 1 1 ACATCATCCAG AG 

I I I I I I I I 1 I 1 1 GGTTCTCTCGGTA 

I I I I I II II I I I 1 GATGTGTCTCTC 
|| 1 1| | 1 1 1 1 1 1 GCGCCATGACCAG 

I 1 1 1 1 1 I 1 1 I 1 1 GGCGTCCTGGTCA 
20 I I I II I I 1 1 I I I A GGAGGACCTGAG 

1 1 1 II I I 1 1 I I IGCGCCAGGCACAG 

I I 1 1 1 1 I I I 1 1 1 AGGAGGGGCCGGA 
1 1 1 1 1 1 Ml 1 1 I CCGCTGCTCCGCC 
m I 1 1 I I I I I rACACCATCCAGAG 

25 TTTTTTTTTTTTCACACAGATCTAC 
I I 1 1 I I I 1 1 1 1 I GGGCATGACCAGT 
1 1 1 1 1 1 1 1 1 1 1 I CACACAGATCTCC 
I H 1 1 1 1 1 1 1 1 1 GCGAGTGCGTGGA 
III1IIIIIIIII GGTACCCGCGGA 

30 I 1 11 1 1 1 1 1 1 1 1 CCTGTGCGTGGAG 
1 1 1 1 1 1 I 1 1 1 1 1 A GACACAGATCTT 
I 1 1 1 1 1 1 1 1 1 1 I CAGCGACGCCACG 
11 1 1 I I I 1 1 1 1 1 CGGGCCGGGACAC 
1 1 1 1 1 1 1 1 1 1 1 I CCCGTCCCAATAC 

35 | 1 1 1 1 1 1 1 1 1 1 1 GGGCATAACCAGT 
1 1 1 1 1 1 1 1 1 1 1 I GCCCCGCTTCATC 

I 1 1 1 1 I 1 1 1 1 1 1 C AGGAGCGCAGGT 
1 1 1 1 1 1 1 1 1 1 1 1 CGTCCACGCACAG 

I I I 1 1 1 1 I 1 1 1 1 GAGTCCGAGAGAG 
40 1 1 1 1 11 1 1 1 1 1 1 GACACAGATCTCC 

1 1 1 1 1 1 1 1 1 1 11 1 A ACCAGTTAGCC 
1 1 1 1 1 1 1 1 1 1 1 1 1 A GGCGTGCTGGT 
I 1 1 1 1 1 1 1 1 1 1 1 GACCCTGCTCCGC 
1 1 1 1 1 1 1 1 I 1 1 1 GGGGCTCCGCAGA 
45 I I I I n I I 1 1 I I CCGGTCCCAATAC 
IIIIIIIIIIU GCGGGTCACGGCG 

I I I 1 1 1 1 1 1 1 I I A GGGCCAGGGCTC 
1 1 1 1 1 1 1 1 n 1 1 ATCCTCTGGAGGG 

1 1 1 1 1 1 1 1 1 1 1 ' TGGCAG ACGATGTA 
50 1 I I I 1 1 I I I I I I AGGCGGAGCAGGA 
I I I 11 1 I I I I I I CAGCTGCTCCGCC 
I 1 1 1 1 11 1 1 1 1 1 ATCTGCGGAGCCA 
I 1 1 1 1 1 1 1 1 1 1 1 CGGAGCTGTGGTC 
II1II I IIIIII CGACCACAGCTCC 

55 I I 1 1 I I I I I I I I GAAGAGTTCAGGT 
I1IUII I II1I CATGTCGCAGCCA 
I I 1 1 1 1 1 II 1 1 I CTGGGCTGGCTCC 
I I I I 1 I I I I I I I CAACACACAGACT 
I I I I I I I I I I I I I GGCGGAGCAGGA 
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T il 1 1 1 1 H U I I ATGACCAGGACG 
TI MIIIIIUr CCACTGCTCCGCC 
T 1 1 1 H 1 1 1 1 1 1 A TGACCAGGACGC 
■ Ml I 1 1 1 1 1 1 1 GGAGGGGCCGGAG 
5 TTTTTTTTTTTTTGCGTGGACGGGC 
1 1 1 I I I I I I I 1 1 AGATCTGTATCTC 
TTTTTTTTTn I GCGGGTCATGGCG 
T 1 1 1 1 1 1 1 I 1 1 rCCGGGACATGGCG 
1 I 1 1 I I I I I I I rCCACAGCTGTCCA 
10 TTTTTTTTTI I TCGGGACATGGCGG 
TTTTTTTTTTTTCCCGTCCACGCAC 
| || | I I I H I I I GAAGTGGGAGCCG 
nTTTTTTTTTn r CCCAATCCACC 
H 1 1 M M H I > CCCACGATGGGC3A 

K rmTrrrnn i tcccagtccacc 

T 1 II 1 1 I I I 1 1 I GAGATCTGAGCCG 
IHMIMIIUI CCACGCACTCGC 
| | I I I I I I I II r GACAGCGACGCCA 
I IIIHIIIMI CGCCGCGGACACC 
1 1 II 1 1 1 I I 1 1 1 GTAGGAGGAAGAG 
1 | I I I I n I 1 1 I CTTTTCCACCTGA 
TTTTTTTTTn rCACGTCGCAGCCA 
rrTTTTTTTTI I CAGGTCGCAGCCA 
1 1 1 1 1 1 1 1 1 1 1 I CGTAGCCCACTGC 
?S II I I I I I H 1 1 1 ATCCAGGTGATGT 
TTTTTTTTTTTTTCCCAATCCACCG 
TIIIIIIIIIII GGGCGCTTCCTCC 
I 1 1 1 1 1 1 1 1 1 1 T CCCGCTTCATCGC 

I 1 1 1 1 1 1 1 1 1 1 1 CCCCGCTTCATCG 
30 Tlll llllinT C ACACAGACTTAC 

I I M 1 1 I 1 1 I ITA GGACGGTTCGGG 
1 IIIIIM I III CCCCGAACCGTCC 
1 1 1 1 1 1 1 1 1 I f TGAGCTCTTCCTCC 
1 1 1 1 1 1 1 1 I 1 1 1 GCTCCCGAGAGCA 

35 1 I I I I 1 1 1 1 I 1 1 ACTCCATGAGGCA 
1 1 1 1 1 1 1 1 1 1 1 1 'GCTGTGGTGGTGC 
1 1 1 1 1 1 1 1 1 1 M 1 1 GTCCAGAAGGC 

nnnmini GcccGCGGAGGA 

11 1 1 1 II 1 1 1 1 f GCCGCGGACAAGG 
40 ll l l lllllli rCCGCCTTGTCCGC 
TTTTTTTTTTI rCGGGTACCACCAG 

HLA-C 

1 II I II I 11 1 1 1 1 GAGCTGGGAGCC 
1 1 1 1 1 1 1 1 1 1 1 1 'GGTGCAGGGCTCC 
45 I I I I I I I I I I I T GGGTGCAGGGCTC 
1 1 1 1 1 1 1 1 I I I I GAGGCGGAGCAGC 
rnTTTTTTTI I A CGGCGGAGCAGC 
II 1 1 1 1 1 1 1 II I GCGGCGGAGCAGC 
I | 1 1 1 1 1 1 1 n I A GCGCGCGGAACC 
50 1 I I I 1 1 I I I I I I CGGCCCAGGTCTC 
1 II I I 1 I I I I 1 1 1' GGCTCCCAGCTC 
I I | 1 | I I I I I I f GCGCGCGGAACCC 
I I 1 | | I 1 1 I I I I ACGGCTTCCATCT 
HI I IIIIIIII GGTTCGGGGCTCC 
1 1 1 1 I I I I I I I I ACTCCACGCACAG 
| 1 1 1 I I I I I M I I GGAGCAGGAGGG 
TTTTTTTTTn I GCGCGCAGAACCC 
I | | I I I I I I I I I I GAGTCTCTCATC 
1 I I I I I I I I I I I CCTGCAGCCCCTC 



55 



WO 00/65088 



PCT/E POO/03636 



-44- 



1 1 1 1 1 1 I 1 1 1 1 I CCGCCGTGTCCGC 
| | | I I II I I I I I CCGCTGTGTCCGC 
1 1 1 1 1 II I 1 1 1 1 1 CCAGAATATGTA 
| | | | | H I I II I CGGGGAGCCCCGC 
5 IIIII I IIIIH GCCGTCGTAGGCG 
I 1 1 1 I I 1 1 1 1 1 I CCGCCAGGCACAG 

I I I I I I I I II I I GCGCCAGGCACAG 

I I I I I I I I I I TTGTAGCCGCGCAGG 
| | | | | | | | I It I GCTGGACGCAGCC 

10 1 1 1 II I I I I 1 1 1 1 CCAGTGG ATGTA 

I I I I 1 1 1 1 1 1 1 1 I CCACGCACAGGC 
1 1 U I I I I I I I I GCCGTGTCCGCAG 

I I I II I I I I I I T GAGGGGAGCCCCG 

I I 1 I I I I I I I I I CGTGTCCCGGCCT 
15 1 1 1 I 1 1 II I 1 1 I GGCATGACCAGTT 

IIII I IIIII I I GGTATGACCAGTT 

I I I I I I I I I I II GACAACCAGGACA 

I I I I | | I I I I 1 1 GAATATGTATGGC 

I I I | | | I I I I I I GACAGCCAGGACA 
20 1 1 1 1 I I 1 1 1 1 1 1 CTGGCTGTCCTGG 
1 1| I I I II II I I CTCCTAGGACAGC 
1 I I I I I I 1 1 I I I AGGGCCAGGGCTC 
I 1 1 I I I I II I I I I ATAACCAGTTCG 
| | | n I I I I I I I CATAGGAGGAAGA 
25 TTTTTTTTTTTTTGTGGAGACCAGG 
| 1 1 | I I 1 1 1 II I I GCTCTTCTCCAG 
I I 1 1 I I I I I I I TGAAGAATGGGAAG 
M H I I 1 1 I M I I GCGGAAACTGCG 

16S rRNA 

30 1 1 1 1 1 1 1 1 1 1 1 1 1 A GCCGCCTGCGT 

I I I I I I I I I I I I GGCCGCAAGGCTG 
H | | | | | | I II I GAACTGCCGTTGA 

1 1 1 1 1 1 I II 1 1 1 A GACTGCCGCTGA 

I I 1 1 I I 1 1 1 1 1 1 1 1 ATTCGGAATTA 
35 1 1 1 1 1 1 1 1 1 1 1 1 1 I GCACCCCTTGT 

I I 1 1 I I 1 1 I II I CGCGAGGTTGAGC 
1 1 1 1 I I I II 1 1 1 I ACCCCCCATTGT 

I I I I I I 1 1 1 1 1 r CATTTGATACTGG 
| | 1 1 I I 1 1 1 1 1 r GTGTGCCTAATAC 

40 IIIIIIHIIIII ACGACTTAACCC 

II I I I I 1 1 1 I I I CCCGGCCTTTGTA 

1 1 1 1 1 1 1 1 1 1 1 1 GGGCAAACTGGAG 

I I I n I I I I I 1 1 GATTTGATCCTGG 
III I IIIIMIII GACTCCCGAAGG 

45 1 1 1 1 1 I I I M 1 1 GAAGTCGTAGCAA 

I I I I I I I I I I I I CGCTGCAGAGATG 

II I I I I I I I I I I IA CCCTACCTACT 

I I I I I I I I I I I I GAGGACCTTCGGG 
I II I 1 1 II 1 1 1 1 A AGGGCCATTACC 

50 IIIII I IIIIII GATAAACGCTGGC 
| U I I I I I I I I I GACTAGCTACTCC 
I I I I I I I I I I I I ACATCCGGTGTTA 
I I I I II I I I I I I ATCGCAGGCCTTG 
I I U I I | I I 1 1 1 1 CACCAAGTCGCT 

55 | I I I I I I I I I I I ICCCTCCTTTCGG 
I I H I I I I 1 1 | | || I AAACGCTGGC 
I I | | I I I I I I I I CGAAACCGCAAGG 
I I I I I I I I I I 1 I GCAAGCGTCCTCC 
|| | I I I I I I I I I ACCAAGGACGTTT 
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TTTTTTTTTTTTCTAATACCCGGAG 
iTTTTTTTTTI 1 ACTTTCAGTGGGG 
TTTTTTTTTTI 1 CTGCGTGAAGTCG 
iTTTTTTTTI I T A ATAGCCCACCAA 

5 TTTTTTTTTTTTAACGGAAACGGGG 
iTTTTTTTTTI 1 GGATTGCACTCTG 
rTTTTTTTTTTI 1 A GCCTTGGGGAG 
lllllllllll 1 CGCCGCATGGCTG 
TTTTTTTTTTI I GCATAAGGGGCAT 

10 TTTTTTTTTTTTTACCACATCTCTG 
TTTTTTTTTTi I GTTACCGCGAGGA 
TTTTTTTTTTI I GGCTTTCAGAGAT 
ITTTTTTTTI I TCGCTGCTTCGCTG 
1 1 M 11 I 1 M I I I A GCGCTACCTTG 

1 5 11IHM1M11 GCA CCACC TGTCA 



l l l l lllllll I CTAATACGGGATA 
I I 11 I 11 11 1 1 I AGGAGAAAGCTTG 
1 11 I 1 I 1 I 1 1 I 1 1 1 AAGAGATTAGC 
20 lllllllllll 1 GTAGCATTCTGAT 
TTTTTTTTTTI r AGGCTTTCCCCCA 
lllllllllll 1 AGAAGTAGCTTGC 
rTTTTTTTTTTI 1 CGCGTATCATCG 
M 1 1 1 11 1 1 I U I I CAGAGATTAGC 
25 TTTTTTTTTTTTTCCGAAAGCGTGG 
rTTTTTTTTTTI 1 ACAACCCGAAGC 
MINIMUM 1 GTCATGGCTCAG 
TTTTTTTTTTTTCGTAGGCTTGGTG 
11 1 I 1 M 1 11 II GTGGAATTCCACG 
30 TTTTTTTTTTTTACGGTTCCCGAAG 
1 M 1 | | 1 1 I M 1 A ACTCGAGTGCGT 
UMII I I I INI GATGTGCTATTA 
TTTTTTTTTTTTAAGCAGGGAGGAA 
llllll l l ll l 1 CTGCTGCAGTGAA 
35 1 1 I I II I I I I I I I I GGGATTAGCTC 
1 1 M 11 11 11 1 TCCTTTGATACTGG 
TTTTTTTTTTTTGGACGCTAGCGGC 
M i lium 11 GTTTACTACCCAC 
iTTTTTTTTTI 1 CGCGATCTCTAGC 
40 TTTTTTTTTTTTTAGGCCGTTCCCC 
ll l llllll l l I ACGCGTTGCATCG 
TTTTTTTTTTI \ GCCCGTCAAGCCA 
l llllllllll I AGTCCCCGCCATT 
1 I 1 1 M M I I I I CTAGCCGTAAGGG 
45 rnTTTTTTTTTT GTCCTTCGGGGG 
TTTTTTTTTTTTAACCAACTCCCAT 
1 M 11 I I I I I I 1 ACTGTGGGTAATA 
M I 11 M 11 11 1 CTGAAAGATGGCG 
M | M II M I I I CGAAAGCCAGGGG 
50 T1 I I II I II II I GTCCGGAATTCTG 
I M M M I M 1 I CAGAAGTGGGTAG 
M M M I I I I M 1 CAGTCCTCATGG 
1 1 1 11 I I 1 I I TTGAAAGAAGCTTGC 
1 M 1 1 1 M M 11 GACCACCTGTCAC 
55 I N I n II U I I n I GGAACTGCAT 
M 11 I I II I M 1 ACAGTTCCCGAAG 
M M M M I M 1 CTCATATCTCTAC 
M I 1 1 M 1 11 1 M 1 CAGTGAGGAAG 
M I I M M II I I ACTGTGAGGAAGG 
60 M 1 M I II M M CCCAGCCCGTAAG 
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UHIIIMHI CGTAGCCTTGGTG 
I M II I 1 1 1 1 I rATGATGCGTAGCC 
1 1 II 1 1 I 1 1 1 1 1 A GGCAGTGGCTCA 
TTTTTTTTTTTTCAGGACTTAACCC 
5 nniHIIIII GGCCAGGCCGTAA 
1 1 1 1 1 1 1 1 1 U 1 1 CCAACTTCGTGC 
1 1 1 1 1 I I 1 1 1 1 1 GAAGCGTGTGTGA 
TTTTTTTTTTTTCTCCCCCGAAGGT 
I 1 1 I I I I I 1 1 1 1 ATGGG AGTTTGTT 
10 TTTTTTTTTTTTGTGTGCCGTTACC 
M 1 1 I 1 1 I I 1 1 I AGCAGTGAGGAAT 
Hllimilll GCCCCGGTTAACT 
1 1 1 1 1 1 1 1 1 1 1 1 GCACCGGCAGTCA 
I 111 1 1 1 1 1 1 1 I GGACCTTCCTCTC 
15 1 1 1 | I I 1 1 1 H I ACCTAGGTGGGAT 
I 1 1 I I I I 1 1 I I I AATAGCTAATACC 
HIIHIIIII IGCCATATCTCTAC 

rrmTTTm i gccggtggggtaa 

I 1 1 1 I I I I I I II IA CCCCACCTTCG 
20 1 1 1 1 1 I 1 1 1 1 1 1 CAAGGCCTGGGAA 
1 1 1 H M I I 1 1 r CAACCCTGGTGGC 
1 1 1 1 1 1 1 1 1 1 1 r CTAGTCATCCAGT 
HUH III III GGCTGCTGCCTCC 
1 1 1 1 1 1 1 1 1 1 1 I CCCAGAGCTCAAC 
25 || I I 1 1 1 1 I I 1 1 GAAAGCTTGATCC 
1 1 1 1| 1 1 1 1 1 1 1 AACACGCTGGCAA 
HHIIHIIII GAGCTTGCTCCCC 
11 1 1 1 1 II 1 1 1 1 ATTTAGTTGAGCA 
TTTTTTTTTTTTCGACTTAGGCTCA 
30 1 1| || II 1 1 1 1 1 1 I GATGTGCTATT 
IIIIHIIIIU CTTAGGTGCCAGC 
TTTTTTTTTTTTGGCTACAGATCGT 
1 I 1 1 I I I I I 1 1 IA ACTTGCGTGCAT 
TTTTTTTTTTTTGCGATTACGTCAA 
35 1 1 I I 1 1 1 1 1 1 1 1 GGACGTTGGCGGC 
1 1 1 1 1 1 1 1 1 1 1 1 1 GGTGGAGCATGT 
1 1 1 1 1 1 1 1 1 1 1 1 A TAAACCATGCGG 
1 1 1 1 1 1 1 1 1 11 1 AAGAAGTGGGTAG 
H I 1 1 1 1| 1 1 1 1 AACAAGCTAATCC 
40 nil Mill MII CCATGGTTTGAC 
- 1 1 1 1 1 1 1 1 1 1 1 1 AGTAACTGCCGGT 
1 1 1 1 1 1 1 1 1 1 1 1 CAAAAGGGGGCGT 
UIII I IU I U GGCGCTTGCGCTC 
TTTTTTTTTTTTGCTACCTACGTGC 
45 1 1 1| 1 1 1 1 1 1 1 I I GCGAGGTGGAGC 
1 1 1 1 1 1 1 1 1 1 1 1 CGCGAGGTGGAGC 
TTTTTTTTTTTTGCTACCTACTTCT 
I H U 1 1 1 1 1 1 1 1 1 AACACATACAA 
1 1 1 1 1 1 1 1 1 1 1 1 1 GTTGTGAAATGT 
50 1 1 1 1 II 1 1 1 1 1 1 CGTAAAACTCAAA 
1 1 1 1 1 1 1 1 1 1 1 1 1 CAAGGGGCAAGT 
HIMIIIIHII CCAACCTTGCGG 
1 1 1 H II 1 1 1 1 1 GGAGGAACGTGGG 
I 1 1 I 1 1 1 1 1 I 1 1 ATAAGCCTCTCAG 
55 niiiullllll ATGCTAATCCCA 
TTTTTTTTTTTTGATGCTAATCCCA 
H I I I I I I H I I GCCAGTGTTCGTC 

I I I I I II I I I I I GTAAAGGTGGGGA 

I I I I I I I I I I I I I I AACACACCGCC 
60 1 1 1 1 I I 1 1 II I I CCAAGGCGGTGAT 
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| 1 M 1 1 I I I I I I GCTACGGCTAACT 
I I I I | || I I I 1 1 AGTGGAGCACTCT 
1 I I H I I I I I I I AAGGGTAGCTAAT 
1 I I I I II I I I I I GTCACAGTACGAG 
5 | 1 | | 1 1 | 1 1 1 1 I I GAAAGCACTTTA 
| | I I I I I I I II I GGCGCAAGGCTTA 
I I I I 1 1 I I I I I I GCCTAGGTGGGAT 
M I M I I 1 1 1 1 I GTCCCCACGTTCC 
H | I I I I I I I I I GGCCACAAGGGGA 
10 U M I I I I II I I CTA6CTGTAGGGA 
TTTTTTTTn T TGTGGGCAGCAAGC 
1 I I I I I I 1 1 II I I C GAAAGATTAAA 
I UII1UIIII GGAGTATGGTCGC 
I | 1 1 I I I I I I I I CGAGATGTGAAAG 
15 I I I I I I I H n I GGGCAGGCTAGAG 
1 1 1 1 1 1 1 1 1 1 1 I ACCTCCTGAGCCA 
rTTTTTTTTTn I CCACCGCTACAC 
| | I I I I I I I I I I I I I CAGTCTTGCG 
I 1 1 1 1 1 I 1 1 1 1 1 CTTGACGGGCGGT 
20 m | I I I I I I I I ACGGTAAAAGATG 
IIIIIIIIIIMII CACCCTTGCGG 

I | | | H I I I I I I I AACCAGAAAGCC 
TTTTTTTTTTTTCAACCAGAAAGCC 
1 1 1 I 1 1 1 1 1 1 1 I GTGTCAAAGGCAG 

25 rTTTTTTTTTn I A AGTCCGGATTG 

I I I I I I I I I I I I GCGACATGCTGAT 
1 1 1 1 1 1 1 1 1 1 1 1 ATCAGCCTGCCGC 
TTTTTTTTTTTTGTCGGTAGGGTAA 
H 1 1 1 1 1 1 1 1 1 f GTCGGTGGGGTAA 

30 I M I I I I I I I I I CAACTCATAAG6G 
TTTTTTTTTTTTTTCACTGCTT AAA 
TTTTTTTTTTTTCGCCAGTCCCACC 
TTTTTTTTTTTTCTAGTCATAAGGG 
ITTTTTTTTTT1 CACTGATTTGACG 
35 TTTTTTTTTTTTGGCCACACAGGGA 
I II I 1 II I I II I I I I CCCCCATTGT 
mm HUH IGACCAGAAAGGG 
1 1 1 1 1 n 1 1 1 1 1 ACACTGGGGGATA 
TTTTTTTTTTTTTCAGCCGCCTTCG 
40 TTTTTTTTTTTTGTCGCCAGCTCGT 
I I I I 1 1 I I I 1 1 I CTCATATGAATTG 
I 1 1 1 1 1 1 1 1 1 1 I I GTAAAGGGAGCG 
1 1 1 1 1 1 1 1 1 1 1 1 CGTAAAGGGAGCG 
1 1 1 1 1 1 1 1 1 1 1 1 GGCGGCTCCCTCC 
45 IIIIIIIII1II CAGATGTTCCTCC 
TTTTTTTTTTTTGTCTCACGACACG 
TTTTTTTTTTTTTCAGCCGCCTACG 
TTTTTTTTTTTTTTGTGCTAATACC 
| 1 1 1 I I I I I 1 1 1 CTTGGAACTGCAT 
50 I II 1 1 1 1 1 I 1 1 1 AGTACTCACCCGT 
| | | 1 I I I I I I I I ATTGCTCCATCAG 
IIIIIIII II III GATCCTGAGCCA 
III I IIIUIM AGCAAGTAGAACG 
II I IIIIIIIIM GCAAGTAGAACG 
55 iiin i llllll GATAACCGCAAGG 
1 1 I 1 1 1 1 1 1 I 1 1 GCAAGCGTTTTCC 
I I I 1 1 I I I I I I r GAATACCTCCTTT 

I | I 1 1 I I I I H I ACAGAGCTTTACA 

II I I I I I I II I I I GTCCTTCGGGAG 
60 | n | | | | I I I I I AGGCGGCTTGCTG 
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Heterozyqotes 

From CFT if available, otherwise greedy algorithm. 

5 DPA1 

! I I I I I ! I I I I I GCCCAGGGCACAG 
TTTTTTTTTTTTCTGTTGTTCTATG 
TTTTTTTTTTTT AAG G AAAAG GCTC 
10 1 1 1 I I I I I I t 1 1 ATGAAGATGAGCA 
TTTTTTTTTTTTCACCCTCAGTGAC 

I [ I I I 1 I II I I I GTCAACTTATGCC 

TTTTTTI I 1 I I 1 GCAGGAAGAGGCT 

I n II I 1 1 I I I I I I I GTACAGACGC 
15 1 I I I I I I I I II I CGGTCTCCTTCTT 

1 I I I 1 I I I I M I GCAATGGGGAGCC 

1 I I I I I 1 1 i I 1 1 I GGATCTGGATAA 

1 1 1 I 1 I 1 1 I 1 I 1 1 G ATGAAG ATG AG 

T 1 1 1 I 1 I I I I I I 1 GTTTGTACAGAC 
20 1 I I I II I 1 I I I 1 CGTTTGTACAGAC 

11 11 I I i I I I I 1 CTCAGGCCGCCAA 

I I I I I I I I I I I ICTCAGGCCACCAA 

II I I I I I I I I I I ATGTGGATCTGGA 

I I I I I I I I I I I IACACTCAGGCCGC 
25 I I I I 1 1 I 1 I I I ICACACTCAGGCCG 

I I I I I I I I I I 1 I ICAGGCCACCAAC 

I I I I I I I I I I I I CGTCTGTACAAAC 

I I I I II I I I I I I AG AACATCTCATC 

I 1 1 I I II I M I I AG AACTGCTCATC 
30 I I I I I I I I I 1 I M I GAATTTGATGA 

1 I I I I n I 1 I I I 1 I GAGTTTGATG A 

DPB1 

35 I I I I I I I I I I I I CAACCGGGAGGAG 

I I I I I I I I I I 1 1 CAACCTGGAGGAG 

l I I I I I I II I I I CATCCTGGAGGAG 

I I I I I I I I I n I I GCTGGGGGGTCA 
TTTTTTTTTTTTG GCCTG ACG AGG A 
40 II I I I II I I I I AACTACGAGCTGG 
TTTTTTTTTTTTTCCAGAGAATTAC 

I I I I I I I I I I I I I GCCGTAACTGGT 

I I I I I I I M I M I CCAGTACTCCTC 

rTTTTTTTTTI I AGTGCCGG ACAGG 
45 I I I I I I I I J I I IACCCCCCAGCAGG 

I I I I I I I I I I I I AGAG AATTACGTG 

I I I M I I I II I I ICCAGTACTCCGC 

I I I 1 1 II I I I I I GCATTCCTGCCGT 

I I II I I I I I I I I CGGGAGGAGCTCG 
50 I I I I I I I I II I I CAGCCAGAAGGAC 

I I I I I I I I I I I I ATTGCCGGACAGG 

I II I I I I II I i I CTGCAGCGCCGAG 

I I I II I I I I I I IGCGCGTACTCCTC 

I I I I I I I I I I I I ACAGAATTACCTT 
55 I II I I I I I I I I I I I AAGTGTACCAG 

II I I I I I I I I I IATCCTGGAGGAGA 
I I I I I I I I I I II GGTCATGGGCCCG 
TTTTTTTTTTTTG GGAGGAGTACGC 

I I t I I I I I I I I i I GGGGCGGCICTGA 
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11 1 1 I I I t I 1 I I AAAAGGTAATTCT 
TTTTTTTTTTi I CTGCCGTAACTGG 
H u 1 n 1 t I I I I I GTGTCTGCATA 
1 I I I I I I I 11 1 1 GGCTGTTCCAGTA 
5 U l llllHHI GTCCCTGGTACAC 
1 n | | n I M I I CCTGCAGCGCCGA 
1 I I I I I 1 1 1 1 1 1 I CTTGGAGGGGGA 
1 1 H I II I M I I GAGGTCCTTCTGG 
nTTTTTTTn I CAACCGGCAGGAG 
10 1 1 II M I I I I 1 1 I GTGTCTGCATAC 
m II I I I M I I CGGGAGC.AGTTCG 
TTTTTTTTTTTTTG ACC CTG OAGCG 
| H I I I I I I I I I CAGAGAATTACCT 
1 1 1 I I I I I I I 1 1 I GGGTAGAAATCC 
15 I I I I I I I I I 1 I I I I ACGTGCACCAG 
TTTTTTTTTTTTCGCTGCAGGGTCA 
TTTTTTTTTTTTAGCCAGAAGGACA 
TTTTTTTTTTI I GTTCCAGTAGTCC 
HHI Il im 1 GGCCTGCTGCGGA 
20 T I 111 I III III t GCAGCGCCGAGG 
TTTTTTTTTTTTACTACGAGCTGGT 
TTTTTTTTTTTTCTGGGG CG GCCTG 
TTTTTTTTTTTTACAGCGACGTGGG 
T 1 1 II I I I I M I I G CCGGACAGGAT 
25 I I I 1 1 I I I I 1 I ICTGCCGTCCCTGG 
TTTTTTTTTTTTCATGGGCCCGACC 
1 I I I I I I I 1 I I I GTCCCATTAAACG 
1 1 1 1 | 1 n I 1 1 1 GTAACTGGTACAC 
M I I I I II I I I I A AGGACCTCCTGG 
30 II I I I n I I I I I CTCCTGGAGGAGA 
I I I I I I I I II I I GAGAATTACGTGT 
rTTTTTTTTn I CCTGATGAGGTGT 
TTTTTTTTTTTTCACAGGAGGAGCA 
I I I M I I I I I I I I GCCGTCCCTGGT 
35 I I I I M 1 1 M 1 1 GGGAGGAGTTCGC 

I > I I | M M I I I I GGACAGGAGGAA 
M 1 M II t II I I ACCCTGCAGCGTC 
U I I 1 1 I I 1 I 1 I CCGCCCGGAACTC 
TTTTTTTTTTI I GCTGCAGGGTCAC 

40 1 1 I I I I I I 11 1 I A CAGGACTATCCA 
TTTTTTTTTTTTGCGTACTCCTGCC 
TTTTTTTTTTTTCCGTAACTGGTGC 

I I 1 1 t I I I I I I T GCAGGAATGCTAC 
1TTTTTTTTTI I CCAGGCAGCATTC 

45 I I I I I I 1 I I I 1 I AACCGGGAGGAG 

ITTTTTTTTTTI I GGCCTC.AGGCGGA 
I I I I I I 1 1 I I I I ACTACGAGCTGGG 
1 1 1 u I M I I I I ATGAGGTGTACTG 
TTTTTTTTTTTTATACATCTACAAC 
50 II II I I Hllll I AACTGGTACACT 
U l U I I II U I CACGTAATTCTCT 
TTTTTTTTTTTTAGCATTCCTGCCG 
TTTTTTTTTTTTACTG GTAC ACTTA 
TTTTTTTTTTTTGGCAATGCCCGCT 
55 I U U n I I U I GCTTCGTGCTGGG 
TTTTTTTTTTTTCGCCCGGAACTCT 
M I I I I I I I I I I A CAGGACTGTCCA 
IHIIIUII I I r CCTCCAGGAGGT 
UIUUIIIU CCTTCTGGCTGTT 
60 IIHmilM 1 GTTCCAGTACTCC 
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TTTTTTTTTTTTGCGCTGCAGGGTC 
TTTTTTTTTTTTAACCTGGAGGAGA 
1 l M H I 11 I 1 II I CCTGCCGTAAC 
11 I U1I1IU I ACGCTGCAGGGTC 

5 11 I n 1 I I 1 I I 1 CCACAGAATTACC 
M I I I I I I 1 I I I CCAGAGAATTACG 
TTTTTTTTTTTTCGCCGAGTCCAGC 
1 1 1 I 1 I 1 I I M I AACAGGCAGGAGT 
TTTTTTTTTTTTTCCTCCAGGATGT 

10 I I 1 I 1 I I I I I I I AACCGGCAGGAGT 
I 1 1 1 1 1 I 11 M 1 CTCCAGAGAATTA 
1 1 1 I I 1 1 1 U I 1 GTTCCAGTACACC 
TTTTTTTTTTTTCTCCTGTAGGAGA 
I I I I 1 I 1 I I II I I I ACCTTTTCCAG 

15 I I I 1 1 I I II 1 I I GGAGGAGTTCGTG 
I 1 1 I 1 II I I I I I GAGGAGCTCGTGC 

I 1 I I I II I I Ml GCCGTAACTGGTG 

I I 1 1 I 1 1 I 11 1 1 GCCCGCTCCTCCT 
1 11 I I I I n I 1 I CGTCCCTGGAAAA 

20 M I I I I I I 11 1 I GCCGTCCCTGGAA 
I 1 I ] I I I I 1 I I ICCCCTCCAAGAAG 
I II I I I I I I 1 1 I GCTGCCTGGGTAG 
1 1 1 I 1 I I I I I 1 1 I CCAGTAGTCCTC 

I I I I 1 I I I I II I ATTCCTGCCGTAA 
25 I 11 1 I I 11 M I I CCTGGAAAAGGTA 

II I 11 I II I 1 1 I CGTCCCTGGTAC A 
ITTTTTTTTT1 I CTCCTCCAGGAAG 
TTTTTTTTTTTTTCTGATTCTGCCC 
1 I I I I I I i I I I I ATCTCCCTGCTGG 

30 I 1 I I I 1 ! I I I I I GAAGGACAACCTG 
I I I I 1 I I I I II I CGTGCACCAGTTA 
I I I I 1 I I 1 1 I I I CGGACAGGGTATG 
I M M I I I I I I I CGGACAGGATATG 
I I I M I I I I I I I GCACTCGGCGCTG 

35 1 1 1 1 I 1 1 I I 11 I ACACGTAATTCTC 
I I I I I I I I I I I I CGTAACTGGTACA 
I I 1 1 I I I I I I I IAATGACCCCCCAG 
I I 1 1 1 M 1 1 1 1 I r CTCTCCAGGAAG 
I I 1 I 1 I I 1 1 I I I CAGCGACGTGGGA 

40 I I 1 I I 11 I I I I I 1CCTGCCGGTTGT 
I I I I 1 I I I I I II GAAGGACATCCTG 
I 11 1 1 I I I I I I I GAAGGACCTCCTG 
I I I 1 1 1 I 1 1 I I I I GTTCCAGTACAC 
TTTTTTTTTTTTCAGAAGGACAACC 

45 I I I i I I I M I I I GCCTGATGAGGTG 

DQA1 

I I I M I I I I I I I CACAAGAGGCAAC 
50 TTTTTTTTTTTTC AT AAG AGG C AAC 

II l I I M I I I I I GAACAC.AGGCAAC 
I I I I I n n I I I ACATCCTCATCTG 

I I I I I I I II I I I GAGTGCCCATTGC 
TTTTTTTTTTTTCAGCCACAATGTC 

55 I M I I I I I I I I I ACAATCCCAGGGC 
I I I II n I I I I IACAACCCCAGGGC 
rTTTTTTTTTl I GTGGGCATTGTGG 
! II I I M I I I I I ATGGGCATTGTGG 
M I I I I I I M I I CCAACACCCTCAT 

60 M M I I I I I I I I AGACTGTGGTCTG 
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TTTTTTTTTTTTCCAACATCCTCAT 
TTTTTTTTTm GGCCCACAGACAA 
l inilllMU CATGGGCATTGTG 
I I I I I I H I I I I AACATCCTCATCT 

5 TTTTTTTTTT1 1 CAACACCCTCATT 
H 1 I I 1 1 11 11 1 GACTGTGGTCTGC 
| I 1 I I I I It I I I AGCACTGGGGACT 
I I 1 I I 1 II I II I CTTAGATTTGACC 
TTTTTTTTTTTTTTTAGATTTGACC 

10 I I t I 1 I I 1 1 M I CGATGTTCAAGTT 
1 I I I II I n 1 I I CAATCCCAGGGCG 
| | | | | M n II I CCTCGGATGATGA 
TTTTTTTTTTTTTCCACATAGAACT 
TTTTTTTTTTTTAAATTCATGGGTG 

15 I I I I I I I I 1 I I ICAGCCACAATGCC 

I 1 1 1 I I 11 1 li I CACCATAAGAGGC 
| | U | 1 I I 1 I 1 1 I I CCTCCCTTCTG 
ITTTTTTTTT1 I AACTCTCCTCAG 

I I I I 1 I I I I 1 I 1 1 AAATCTCATCAG 
20 1 1 I I 1 M 1 1 n I CTCCTCCCTTCTG 

M I I I I 1 I 1 I 1 1 GTCAGCCACAATG 
I I I I I I 1 I I I I II CATTCCTTCTTC 
1 M I 1 I I I I It I CTTCCTCCCTTCT 
TTTTTTTTTTTTATAACTCTCCTCA 

25 TTTTTTTTTTTTGAGGCTCATCCAG 
TTTTTTTTTTTTC , AGGCTTGTCC AG 
TTTTTTTTTTTTATGTTGACCACAG 
1 M I 11 I I 1 1 1 I AGTGCCCACCACA 
1 M I I I 11 I I I I GAACATCCTGATT 

30 1 I 1 I I I I I I I I I GGACCTGGAGAAG 
I 1 11 1 1 I I I > 1 1 CCCTCTGGCCAGT 
I 1 I 1 1 1 I I I I I I CCCTCTGGGR-AGT 
1 I I I I I I I I I 11 I 1 AC ACCGTAAGA 
TTTTTTTTTn I AGAAG ATTTG ACC 

35 1 1 I I I I I I 1 I 11 GAACTGGCCAGAG 
TTTTTTTTTTTI GCTACAACTCTAC 
1 | I I I 11 n I I 1 CAGTCTTACGGTC 
TTTTTTTTTTTTCAGTCTTATGGTC 

40 DQB1 

I I I l I I I II II I ATCTTGCAGAGGA 
TTTTTTTTTTTTGGCTGGGGTGCTC 
n I 1 1 I 1 1 1 I I I GGGTCACCGCCCG 
45 I I I I I I I I I I I I CTGGGGCCGCCTG 
TTTTTTTTTTTTCTCGGCGCTAGGC 
TTTTTTTTTTTTGTATCTG GTCAC A 
ITTTTTTTTT1 I AACTACGAGGTGG 
TTTTTTTTTTTTCCAGTACTCGGCG 
50 I I I I 1 I I I I I I I CGGTTATAGATGT 
TTTTTTTTTn I GCAAGTCCTGGAG 
I M I I I I I I I I I I GGACACAACGCC 
I I 11 1 I I M I [ I CTGGGGCTGCCTG 
I I I 1 1 1 I 1 1 I I 1 GGCCTTAAACTGG 
55 * I I I II I 1 I 1 I I I I GTGTCTGCATAC 
I I I I i I 1 I 1 1 I I GTCGGAAAGGGCT 
11111111111 1 GGGTGTATCGGGT 
I I I I 1 I I i I I I I CCAGTACTCGGCA 
I 1 I I I I n □ I I GTAGACATCTCCA 
60 T 1 1 I I 1 I I I I I I AGGAAACGGGCGG 
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TTTTTTTTTTTTCACACCCCGCACG 
M I NIMUM I CCGCTCGGGTCC 
TTTTTTTTTTTTAGCATCACCAGGA 
Milium f CC AGTTTAAGGGC 
5 TTTTTTTTTTI I ATAGCCACAAGGA 
1 n 11 I i I I I n GTATGCAGACACA 
TTTTTTTTTTT1 1 CCAGTACTCGGC 
TTTTTTTTTTTTAGCGCACGATCTC 
TTTTTTTTTTTTGGACATCCTGGAG 
10 TTTTTTTTTTTTTGGGGCTGCCTGA 
M I M M M M 1 GTCAGAAAGGGCT 
TTTTTTTTTTI I CAGGAGCCCTTTC 
M I M M M M I I GTCTCTTCCTGG 
TTTTTTTTT1 U ACACCCCGCACGC 
1 5 I I 1 M M U II I TGGTTTCGGAATG 
TTTTTTTTTTI 1 AACGGGACAGAGC 
TTTTTTTTTTTTGCTGGGGCCGCCT 
TTTTTTTTTm GAGGATTTCGTGT 
n M I I I I I M I GAGAGGAGTACGC 
20 11 M I 1 I I M M CACATCAAAGTCC 
M 1 11 11 I I I 1 I 'GCCAGGAGGAGAC 
TTTTTTTTTm GTACTCGGCGGCA 

TTTTTTTTTTTTCGCCAGTTGTCTC 
M 11 11 I 11 I 1 I AGGGGGGTGGACA 
25 M I 1 1 I I I M 1 1 AGATGTATCTGGT 
M I I 1 I 1 1 I I I 1 I GGGGGAGTTCCG 
TTTTTTTTTTTTTGTCTCCTCCTG G 
1 1 1 I 11 1 I 1 1 1 TCACACTCTGTCCA 
M 1 1 I 1 1 11 I 1 1 GGAATGATCAGGA 
30 II 1 1 I » I I 1 1 1 I ATGGGGTCGCCGC 
M I I I I 1 1 1 1 I I CAGATCAAAGTCC 
111 11 111111 I A ACGGGACCGAGC 
TTTTTTTTTTI 1 AGGAGTACGTGCG 
TTTTTTTTTTI 1 'ATGTGACCAGATA 
35 1 II F"l II II I I 1 AGGGGCGGCCTGT 
TTTTTTTTTTTTCGCCGGTTGTCTC 
TTTTTTTTTTTTTGTAACCAGACAC 
ITTITTTTm 1 GTGAAGTAGCACA 
1 I 1 1 II I 1 1 I I I AGCGGCGACCCCA 
40 11 1 M II I I 11 I CACACCCTGTCCA 
1 I 1 M 1 1 I I 1 1 1 GTGTG ACC AG ATA 
M I I I 1 I 11 I 1 i I GGACCTTCCAGA 
M 1 1 1 I I M I 1 I ATCGGGTGGTGAC 
I M M I 1 I 1 I I I GTTTAAGGGCCTG 
45 1 I I 1 I I 1 1 I I 1 1 I G AAGTAGC AC AG 

I M 1 I I I I I I I 1 GCTCCAACTGGTA 
M I I I I 1 I I I 1 I CCTTAAACTGGTA 

I I M I 1 11 I 1 I I AGGAGGACGTGCG 

I M 1 1 I 1 I I I I I I CGTGCTGGGGCT 
50 11 M I I 1 1 1 I 1 1 CGCTGCTGGGGCT 

I I 1 1 I I I II 1 1 I CCAAGGAAGATCA 
M I I M I M II I ACCGCGCGGTGAC 
TTTTTTTTTTTTGCCCTTAAACTGG 
TTTTTTTTTTTI 1 GGTCACACCCCG 

55 I 1 1 I 1 I I 1 1 II ["GGGAGTTCCGGGC 
T I 1 I 1 1 1 1 I I I T AGGAGGAGACAAC 

I M I 1 11 1 M I 1 GGGTGGACACAAC 
TTTTTTTTTm I CTGCTCGGTGAC 

I I M 11 M I 1 1 1 1 GGGGCGGCTTGA 
60 II 11 I II II I I I GCGCACGTCCTCC 
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1 } 1 I I I I I I I 1 1 AGG ATTTCGTGTA 
TTTTTTTTTTTTGCCTTAAACTGGA 

DRB345 

5 l | 1 1 1 1 I I I 1 I 1 GTACCTGGACAGA 
11 U1 I MIM I GTTCCTGGAGAGA 
1 1 1 Ml t I I I 1 I ACACTCATACTTA 
1 M | I I 1 I I 11 I ACACTC AGACTTA 
10 rTTTTTTTTTTI 1 CCTGGAGCAGGC 
TTTTTTTTTTTTTCGAAGCGCGCGT 
1 11 I 1 I I 1 1 I I 1 AATCTGCACAGAG 
TTTTTTTTTTTTAGGGCCCGCCTGT 
TTTTTTTTTTTTAG G ACACTCTGG A 
1 5 Mi l Mill 111 GTGTAAACCTCTC 
TTTTTTTTTTTTCTGTCGAAGCGCA 
rTTTTTTTTTTI GGGGCCGGGCTGT 
1 1 1 11 1 I 1 I 1 1 1 1 CTTCCAGGATGT 
rTTTTTTTTTI rAACTACGGAGTTG 
20 TTTTTTTTTTl 1 CAAGAAACATGGT 
1 1 I 1 I I I I 1 I I 1 I AACCAGGAGGAG 
T TTTTTTTTTTI 1 GAAGCTCTCCAC 
1 1 U I I M 1 1 I 1 GGGGCGGCCTGTC 
U i | | 1 I 11 1 I 1 GCGGCGCGCGTGT 
25 1 M 1 1 1 1 1 1 11 1 1 1 T CTTGGAGCTG 
1 I 1 | 1 11 1 M 1 11 1 CTCTTCCTGGC 
1 M 1 1 1 I I 11 I 1 AACTACGGGGTTG 
I I 1 I I 1 1 1 I 1 1 I GTATCTGATCAGG 
I 11 1 1 11 1 1 1 I I GGCCAGGTGGACA 
30 1 1 i 1 I I 1 1 1 1 I 1 GCCCCAGCTCCGT 
rTTTTTTTTTI I GGTTCCTGGAGAG 
iTTTTTTTTTI I GTCGAAGCGCACG 
TTTTTTTTTTTTGTGTCTGCAGTAG 
1 1 1 H I 11 I 1 I I GCTCCACTTGGCA 
35 11 1 11 I I 1 1 1 1 I I ACGGGGTTGGTG 
1 I 1 1 1 1 11 1 1 1 1 CGGTTCCTGCACA 
1 1 1 1 1 I I I 1 I I 1 1 CCAGTACTCGGC 
I 1 1 1 1 11 1 I 1 1 I I GTCCACCTCGGC 
1 1 I 1 I 1 11 I I 1 I 1 CTTCCTGGCCGT 
40 1 I I I 1 I I I M I I GGTGTCCACCAGG 
1 1 M I I 1 1 1 1 I I A CTCCGTAGTTGT 
1 M 11 II I I II I CACTCAGACTTAC 
1 11 1 1 1 1 I I 11 I GATGCTAGAAACA 
U U M 1 I I I I I GTGGAATGGAGAG 
45 1 1 U 11 I 1 I I U I AACCAAGAGGAG 
I i h | | l 1 l U 1 GTTCCGGAATGGC 
IMM1HM I I GTATCTGCAGTAG 
I TTTTTTTTTI I ACCTCCTGGTCTG 
iTTTTTTTTTI I AGCCAACAGGACT 
50 11 1 1 1 1 1 I I M 1 GCGGTTCCTGCAG 
1 1 1 I I I 1 111 1 I CGCGCCGCGGTGG 
rTTTTTTTTTI I GTAAACCTCTCCA 
1MMMI1III CTGATCAGGCTCC 
HIMlllini TCCAGGACTCGGC 
55 1 I I M 1 1 1 1 1 1 1 AACCATTCACAGA 
iTTTTTTTTTI I CGGGCCCTGGTGG 
iTTTTTTTTTn GTTCCGGAACGGC 
U 1 1 I M I I 1 I T GCGGCCCGCCTGT 
1 M II 11 1 11 1 I 1 CCTGGAAGACAC 
60 1 11 M I I 1 I M I GCCGGGTGGACAA 
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I i n t 1 I I I I I I CTGCTCCAGGATG 
IMlimiM 1 CAACTACTGCAGA 
M I NIMUM GTACCTGGAGAGA 
TTTTTTTTTTTTACCTCTCCACTCC 
5 M M M MM M GTGAAGCTCTCCA 
TTTTTTTTTTTTCCGCGGCGCGCGT 
iTTTTTTTTTTI CTGATCAGGTTCC 
Mlllllllll I AATGGGACGGAGC 
n M Ml M 1 M I ATGGAAGTATCT 
10 M M 1 M 1 I M I I CTGCAGTAGGTG 
TTTTTTTTTTI I' CGGGCCGCGGTGG 
TTTTTTTTTTTTCTGTGCAGGAACC 
M M M i M I M CCAAGAGGAGGAC 
TTTTTTTTTTTTCAATTACTGCAGA 
15 TTTTTTTTTTI 1 CACCTACTGCAGA 
rTTTTTTTTTI 1 CTGCCTGGATAGA 
iTTTTTTTTTI I GTAATTGTCCACC 
TTTTTTTTTTTTCACCAGGGCCCGC 
TTTTTTTTTTTTTGCGGTACCTGGA 
20 M M M M M ( I CCTGCAGCACCAC 
1 1 M I 11 1 1 11 r GCGGCGCGCCTGT 
rTTTTTTTTTTI CCAGGACTCGGCA 
I 1 M M M M M GACACAACTACGG 
rTTTTTTTTTI 1 GATACAACTACGG 
25 1 1 M I i M I M I ACTCAG ACTTAC A 
TTTTTTTTTTTTTGAGACTTACACA 
TTTTTTTTTTTTTACGGGGTTGTGG 
M I M I M I M I GTAGTTGTCCACC 
rTTTTTTTTTI I AACCAGG AGG AGT 
30 M I I I I I M M I AACCAAGAGG AGT 
M II MM I I I M CCACAGCCCCGT 
M M M M M M C AGCC AG AAG G AC 
nTTTTTTTTI I GGAGGAGTTCCTG 
1IMIII11I11 GAACTCCTCCTGG 
35 TTTTTTTTTTTTAACCACTCACAGA 
1 M M I M M I I GGCCGGGCTGTTC 
I M I M M M M CTCACGAGTCCTG 
M I I M I I I I M GTCGAAGCGCAAG 
M II II I 11 M I CCTCCTGGTCTGT 

40 

HLA-A 

M 1 M M I I M I I CAGTCTGTGAGT 
lll ll lllllll CCGCAGGCTCTCT 
45 I M M I I I I 1 1 I ATGAGGTATTTCT 
U I Hlllllll GGACATGGAGGTG 
TTTTTTTTTTTTC-AGGTAGGCTCTC 
Tr M II II M M I ACTCTTGGGGGC 
l llll l lllll l GGTCGCCAGGTCC 

50 T M M 11 I M M GGGAGCCCGCCCA 
M II I I M 1 M I CCGCTGCTCCGCC 
| I 1 l M I I 11 I I rGAAGGCCCAGTC 
1 1 | 1 I H M M 1 GCAGCCATACATC 
ITTTTTTTTTI I CCACTCCACGCAC 

55 rTTTTTTTTTI I CACGTCGCAGCCA 
I I I M II M I I 1 GGTCTGCCCGAGC 
TTTITTTTTTTTCAGGTAGACTCTC 
rTTTTTTTTTI 1 GGGAGACACGGAA 
ll l l l lllllll CCCGTCCACGCAC 

60 llllllllllll GTCCACTCGGTCA 
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TTTTTTTTTTi 1 ATCCAGAGGATGT 

TTTTTTTTTTTTCGCGATCCGCAGG 

TTTTTTTTTTTTCCGGGACACGGAA 

I I I I I 11 1 I M 1 GGAGGAGGAACAG 
5 TTTTTTTTTTTTAAGTGAAGGCCCA 

I I i | i H | n 1 1 GGGGCTTGGGGAG 
I | | | | U H I 1 1 CAGACTAACCGAG 
TTTTTTTTTTTTGTCCTGGGGGGGT 
(TTTTTTTTTl I CGTCGTAAGCGTC 

10 i | 1 I I I 1 I I M I AGGTCCACTCGGT 
TTTTTTTTTTTTGGTAGGCTCTCAA 
TTTTTTTTTT1 1 GCGCGATCCGCAG 
TTTTTTTTTTTTGTGTCCTGGGTCT 
TTTTTTTTTTTTATCC.AGATAATGT 

15 TTTTTTTTTTTTCCGTCGTAGGCGT 



1 I I I M I 1 1 II I CGGACCCCCCCCA 
TTTTTTTTTTTTGCCGCATGGACCG 
I 1 I M I 1 I 1 1 I T GCTGCTCCGCCGC 
20 TTTTTTTTTTTTAGCGCAGGTCCTC 
TTTTTTTTTTTTCTACCTGGATGGC 
MlUIHill I GGTATTTCTTCAC 
M 1 1 H 1 I I I I I ATATGAAGGCCCA 
TTTTTTTTTTTTCCGTGTCTCCCCG 
25 I 1 1 1 I 1 I M n 1 C CGGCAGTGGAGA 
I i n I 1 n I I I ICGGACGCCCCCAA 
U 1 | 1 I 1 I 1 I I I CCGTGAGGCGGAG 
TTTTTTTTTTTTAGGAGACAGGGAA 
U | n I I I M I I AGAGCGAGGACGG 
30 I I IMHUM I GCACATGGCAGGT 
I 1 M I I I I I I 1 ICAGCTGCTCCGCC 
1 I 1 M I I 1 I I I I ATGAACAGCACGC 
1 | 1 U 1 | II I | r CCCGGCCCGGCAG 
{ 1 1 1 1 1 1 I 1 I 1 I GCAGCCTGAGAGT 
35 TTTTTTTTTTTTGACGGTCATG GC 
TTTTTTTTTTTTCCGTCGTAAGCGT 
1 | 1 1 1 1 M II I I GAGTATTGGGACC 

I 1 1 II I I I I II I CTGGCCTGGTTCT 

I I 1 1 || I 1 II I I ACCTCATGGAGTG 
40 TTTTTTTTTTTTAGCCG CCATGTCC 

I I 1 I M 11 I 11 I CACGTGCCATCCA 
M I I I M | I I I I GGTCCCCAGGTTC 
I 1 I I I 1 I I 1 I I I A GG AG AAGACATA 
H I I 1 I 1 I I M 1 CTGCTGCTCCGCC 
45 TTTTTTTTTTTTTGACCCAGACCAG 
TTTTTTTTTTTTCGGGCGGAGCAGT 
iTTTTTTTTTI I AGGTTCGCTCGGT 
I 1 M | I 11 I 1 I I CATATGCGTCCTG 
1 I M I I 1 11 I 1 I CGTCCTGGGGGGG 

50 I I I II I I 11 1 I I GCACGTGCGTGGA 
1 I I I 1 1 I 1 1 1 I I GGTATTTCTACAC 
I I I I I I 1 1 1 11 I AGGAGCAGAGATA 
I I M I I I I I 1 M CCCGAACCCTCGT 
TTTTTTTTTTTTGCCACATGGGCCG 

55 TTTTTTTTTTTTAGCAGGAGGAGCC 
I 1 I I I II I I 11 I ATCCAGATGATGT 
TTTTTTTTTTTTGGATGGGGAGCAC 
TTTTTTTTTTTTGC.ACTGGCGCTTC 
I 11 11 I 1 1 I M I AGCTTGTAAAGTG 

60 II 11 1 I 1 1 I I I I GATAATGTATGGC 




TTTTTTTTTTTTTCACACCCTCCAG 
TTTTTTTTTTI 1 CTACGTGGACAAC 
TTTTTTTTTTTTCGAGCGAACCTGG 
TTTTTTTTTTTTCGAGAC.AGCCTGC 
5 1 1 1 I I M I M I 1 GGGCTACGTGGAC 
TTTTTTTTTTTTACCACCAGTACGC 
1 1 1 n I I I 11 I TGAGGATGTATGGC 
TTTTTTTTTTTi GATCTCAGCCGCC 
TTTTTTTTTTTTGATCTGAGCTGCC 
10 UMIHIIU I GATGATGTATGGC 
TTTTTTTTTTTTATACCTGGAGAAC 
T ' M 1 I 1 1 1 I I I I GATGTATGGCTGC 
TTTTTTTTTTTi I CCGCAGGTTCTC 
i M 1 I I I I I I I I GAGCAGAGATAAA 
15 TTTTTTTTTTI I GGGCTGGGAAGAC 
1 1 n I I 1 1 1 I 1 I GATGGGCAGGACT 
I n I I I I I I I I I I CACTTTCCCTGT 
TTTTTTTTTTTTCCCACGATGTGGA 
TTTTTTTTTTI I AGTCATATGCGTT 
20 TTTTTTTTTTTI GGCGGACATGGCG 
I I I I I I I I II I IGCTCCGCCTCACG 
MINIMUM CGTCGTAAGCGTT 
TTTTTTTTTTTTGATC.ATGTTTGGC 

I n I I I I U II ICACGGACGCCCCC 
25 1 11 M I I I I I I IGCTCCTCCTGCTC 

II I II I I I I I I I ACTCACCGAGTGG 
II M U I II I I I AGTCATATGTGTC 

I M M II I II I I GGTCTGAGCTGCC 
I | | I n II II I n CCCACTTGCGCT 

30 TTTTTTTTTTI 1 GCCCACTCACAGA 
M M I I I I I I I I GGCTCAC.ATCACC 
1 I M I I I I I I I 1 GCTCTTGGACCGC 
1 M I I I 1 II I I I GAGAGCCTGCGGA 
I M I I U I M I I GGAACACACGGAA 

35 I I I I I I M I I I I CGGAACACACGGA 
TTTTTTTTTTI I CGTAAGCGTCCTG 
I | I I I n I I I I I GCCGGTGCGTGGA 

I 1 I I II I II I I I GCCGCATGGGCCG 
n I I 1 1 1 1 II I I CCAGAGCGAGGAC 

40 I n I I I II I I I I CCCAACGGGCCGC 

II I I I I II I I I I CGAGTGCGTGGAG 
1 1 I I U I I I I I I GCGAACCTGGGGA 
ITTTTTTTm I CGGGTACCAGCGG 
I I M I I I IN M I 'GAAGCGGGGCTC 

45 II 1 1 II M I I I I GGCGGCCCGTTGG 
1 1 1 I I I I I I I I I I CTGGGTCAGGGC 
M I I I I M I I I I GCCTCATGGGCCG 
TTTTTTTTTTTI CCATCCCGCTGCC 
| M I I I M M I I AGCTCAGACCACC 

50 I i I I I 1 1 I 1 I I I GTCGTAAGCGTCC 
I 1 I I I I 1 M I rrCCCGGCCGCGGGA 

I 1 I I 1 I I t 1 II I GGTCCCAATACTC 
M I 1 1 I 11 I I I 1CGTCCCAATACTC 
TTTTTTTTTTI 1 GTTCTCACACCAT 

55 I I I I I I I I I I I I I CCTCTGGATGGT 
nTTTTTTTTTT 1 CCCACTTGTGCT 
TTTTTTTTTTTTCCTGACCCAGACC 
rTTTTTTTTTTT 1 GAGAGCCCGCCC 

I I I 1 I I M 1 M 1 GAGTGCGTGGAGT 
60 nTTTTTTTTTT I ACATCATCTGGA 
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TTTTTTTTTTTTGATCCGCAGGTTC 
I 1 1 I M | | | I I I I AGAGCAGGAGAG 
TTTTTTTTTTTTCCTGGCAGCGGGA 
TTTTTTTTTTTTTCATGGAGTGAGA 
5 1 1 1 II M I M I 1 CCGGCCGCGGGAA 
1 I I I I I 1 I I I I rCCAGGACACGGAG 
TTTTTTTTTTTTCCGGGACACGGAG 
TTTTTTTTTTTTGCAGCCACACATC 
TTTTTTTTTTTTGGATGGTGTGAGA 
10 TTTTTTTTTTTTAACATCATCTGGA 
M M 1 1 1 11 I I I I CCTCCTCCACAT 
TTTTTTTTTTTTTGGGCGGAGCAGT 
I I I | I I I 11 1 1 1 1 GCAGGGG ATGG A 
U | | n 1 111 1 1 CGCAGGAAGCGCC 
15 TTTTTTTTTTTTGGCCGTCATGGCG 
rrrTTTTTTTI 1 ATGCGTCCTGGGG 
TTTTTTTTTTTTATGCGTCTTGGGG 
TTTTTTTTTTTTTTTCCCTGTCTCC 
TTTTTTTTTTTTTCAGGGTGGCCTC 
20 H 1 M 11 1 1 1 1 1 GAGGAGGAACAGC 
I 1 1 I 1 I 1 1 I I 1 1 GCGCAGGGTCGCC 
1 M 11 11 11 1 1 I CAGCCAAACATCC 
1111 111 1 111 1 ACTTCTGGAAGGT 
TTTTTTTTTTTl 1 CCTCTGGACGGT 
25 1 I I 1 1 I 1 1 1 M 1 GGAGAAGAGATAC 
U U | 1 1 1 1 1 1 1 ATTCCGTGTCTCC 
TTTTTTTTTTTl 1 CAATCTGTGAGT 
TTTTTTTTTTTTGGCCCGTCGGGCG 
TTTTTTTTTTTTCGGCGGACATGGC 
30 1 1 I I 1 1 1 1 i 1 1 1 1 ACAAGCTGTGAG 
1 I I 1 I I 1 1 1 1 1 1 CGAACTGCGTGTC 
I 1 1 M 1 1 1 1 11 1 CGAGCTCCGTGTC 

I 1 I I 1 I I I I 1 1 T ACTCCACGCACCG 
111111 11 111 1 CTACGTGGACGAC 

35 

HLA-C 

I I 1 11 11 1 1 11 1 1 GAGCTGGGAGCC 
TTTTTTTTTTTTATCACAACAGCCA 

40 1 M 1 I I I 1 II 1 1A GGCTCTCCGCTC 

I I I I I I I 1 1 1 I 1 GGAGTGGGAGCAG 
TTTTTTTTTTTl 1 CACACCCTCCAG 

I I M n 1 1 Ml 1 ACTCCACGCACAG 

I I I I 1 I 1 11 I I 1 GCCGTCGTAGGCG 
45 TTTTTTTTTTTTCGCG CAG AACCCC 
rTTTTTTTTTi 1 AGTAGCCGCGCAG 
TTTTTTTTTTTTGGAGCGGAC.AGCC 
1 1 1 M I 1 I I 1 1 1 CAGGTAGGCTCTC 
TTTTTTTTTTTTGGTTCGGGGCTCC 
50 M 11 n 1 1 111 1 GCCCCAAGCCCTC 
1 M 1 H 1 11 11 1 GGGCATGACCAGT 
TTTTTTTTTTTTGCGGCTCCGCGGC 
TTTTTTTTTTTTTCCAGTGGATGTA 
TTTTTTTTTTTTGGCATGACCAGTT 
55 T 1 1 1 n 1 1 1 I 1 1 CTCACTCGGTCAG 
U I 1 I II I M I r CAAGCCCTCCTCC 
I I 1 I I I I I I I I I 1 AGTTTCCGC.AGG 
I I 1 1 I I N 1 I M CAGGTCGCAGCCA 
11 1 111 1 11 1 11 CACTGCGATGAAG 
60 ITTTTTTTTTI 1 G GT ATG ACC AGTT 
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TT 1 1 I I I I I I I I ACAGCCAGGCCAG 
U I I M 1 I I 1 1 I GAGGCGGAGCAGC 
TTTTTTTTTTTTTG GTTGT AGTAGC 
U I I 1 II I 1 11 I ACCTGCGGAAACT 

5 TTTTTTTTTTi 1 CGGCCCAGGTCTC 
1 1 I I I I I 1 I 1 1 I GCTGGACGCAGCC 
ITTTTTTTTT1 1 CAGGTTCCGCAGG 
n 1 I 1 I N I I I ICCGCCAGGCACAG 
TTTTTTTTTTTTCCTCCTACACATC 

10 I I 1 1 1 I 1 I I M 1 ACGGCGGAGCAGC 
I 1 1 1 I M M I I 1 AGCGCGCGGAACC 
1 M | M I 1 M Ml 1 CACTCGGTCAG 
TTTTTTTTTTTTACGCCGCGAGTCC 
| H 1 I 1 1 I I 1 1 I I GGAGCAGGA.GGG 

15 TTTTTTTTTTTTGGGTATGACCAGT 
TTTTTTTTTTi I A TACCTGGAGAAC 

I M I 1 1 1 I 1 11 1 GGGTTCGGGGCTC 

I I 1 M 1 1 1 I I I T GACCGCTAGGACA 
1 1 1 1 1 1 1 II 1 I 1 ATCTGAGCCGCTG 

20 I I I l~I I 1 I I I 1 1 CGCGGAGAGCCCC 
■1 1 1 II | 1 U 1 1 I CCTGGCGCTTGTA 
1 | 1 U I I I 11 I 1 CCTGCGGAAACTA 
TTTTTTTTTTTTAGCGTCTCCTTCC 
TTTTTTTTTTTTTGGCGCCCCGAAC 
25 H I M I I U 1 II ATGATGTGAGACC 
1 11 1 I I I I I I I I CTCGGTGTCCTGG 
M 1 I I 1 I I I I I 1 GTAGTAGCCGCGT 
TTTTTTTTTTTTAGGATGTGAGACC 
1 l U | I I 11 1 I 1 GGTAGGCTCTCTG 
30 I I I I I I 1 I I M I AGCGTCTTCTTCC 

I M 1 I I I I 1 I I I CATAGGAGGAAGA 
U 1 1 I I I I I I I I GACAACCAGGACA 

II 1 1 I 1 I 1 1 I I I GCCGCGGGGAGCC 
1 1 1 H I I 1 1 1 1 1 GGTGAGGGGCTCT 

35 1 11 I 11 11 11 1 r CGAGGGGCTGCCA 
1 1 1 I 1 I I 1 I 1 I 1 GGGTATAACCAGT 
I M I I I I I I I I I 1 CCAGAATATGTA 
I | | | | | | I II I I GGGTGCAGGGCTC 
TTTTTTTTTTTTCGCGCGGAACCCC 

40 I I I I I 11 1 1 I I 1 I AGTAGCCGCGTA 
M 1 1 11 11 M I I AGCTGCTCTCAGG 
M 1 I M I I I I I I ACCGC ACG AACTG 
Tl I I 1 1 I I I M ICCGCAGGCTCACT 
1 1 I 1 I I 1 I 1 I I I GGTGTGAGACCCG 

45 f I I I I I 1 1 I I I I I GGAGCCCCG AAC 
1 1 1 I I I II I I I I AGCCGCGGGAGCC 
TTTTTTTTTTTTACTGCACGAACTG 
| 1 1 M I I n I 1 1 CCGCACGAACTGT 

I I II I I I I I I 1 1 GGTGCAGGGCTCC 
50 I I I I I I I U H I GCAGCAGGAGC.AG 

Mim i llH IIGAGTCTCTCATC 
MINIMUM CCGCCGTGTCCGC 

II 1 1 1 I 11 I I I I 1 CCACGCACAGGC 
1 1 M I II II I I I ACTCGGTCAGCCT 

55 1 I I 1 I I I 1 1 M 1 CACACC.ATCCAGA 
1 1 M | | | 11 I M CACACCCTCCAGA 
I I M I 1 I M 1 I I GCAGCAGGATGAG 
I I M II 1 11 I I 1 CAGCCACCACAGC 
M M I I 1 1 I 11 1 1 CGTGGCTGGCCT 

60 1 I 1 II II I II 11 1 ACGGCGGAGCAG 
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TTTCTCACACCATCCA 

TTTTGCGGCGGAGCAG 

TTTTCTGAGCCGCCGT 

TTTGGCGGAGCAGCAG 

rmCCGCTGCGGACAC 

rTTTTATAACCAGTTCG 

ITTTCACATCCTCCAGA 

rTTTCCGTGTCCGCGGC 

nTTCGTGGACGACACA 

TTTTCCGCTGTGTCCGC 

TTTTGAAGAATGGGAAG 
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CLAIMS 

5 1 . A method of identifying a set of extendible primers for use in 

the identification, typing or classification of a nucleic acid of known 
sequence having known polymorphisms wherein: 
i) all possible nucleotide sequences of a chosen length of the 

nucleic acid are identified and their corresponding extendible primers, 

10 ii) at least one extendible primer is removed from the set 

wherein the at least one primer removed identifies a segment of the nucleic 
acid identified by at least one other primer. 

2. The method of claim 1 , wherein between steps i) and ii): 
is ia) potential extensions for each primer are identified with 

respect to each nucleotide sequence, 

ib) for each extendible primer the identified potential extensions 

are compared to determine which pairs of sequences can be discriminated 
by the primer. 

20 

3. The method of claim 1 or claim 2, wherein a matrix of primers 
and pairs of primer extensions is prepared in binary form and is subjected 
to analysis by a set covering problem (SCP) algorithm. 

25 4. The method of claim 3, wherein a greedy algorithm is used. 

5. The method of claim 3, wherein a CFT algorithm is used 

which involves a Lagrangrian relaxation heuristic. 

30 6. The method of any one of claims 3 to 5, wherein a set of core 

primers is selected as a base for analysis by the SCP algorithm. 
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7. The method of any one of claims 3 to 6, wherein the set of 
extendible primers identified by the SCP algorithm is subjected to a 
redundancy check. 

8. A set of extendible primers, for use in the identification, typing 
or classification of a nucleic acid of known sequences having known 
polymorphisms, identified by the method of any one of claims 1 to 7. 



The set of extendible primers of claim 8, in the form of an 



io array. 



15 



1 0. The set of extendible primers of claim 8 or claim 9, for use in 
the identification, classification or typing of an organism, allele or gene 
selected from class 1 HLA, class 2 HLA and 16S rRNA. 

1 1 . The set of extendible primers of any one of claims 8 to 1 0, 
wherein the primers are arrayed on a surface of a support in such a way 
that recognisable patterns are formed with different types or alleles. 



20 1 2. A set of extendible primers, for use in the identification, typing 

or classification of a human leucocyte antigen (HLA) gene as indicated, the 
set comprising about the number of primers indicated and being capable of 
distinguishing about the number of alleles indicated: 





HLA.gene 


Number of 


Number of 




Alleles 


Primers 


Class I 


HLA-A 


91 


172 




HLA-B 


200 


<1000 




HLA-C 


47 


94 


Class II 


DPA-1 


11 


26 




DPB-1 


74 


130 




DQA-1 


17 


130 




DQB-1 


34 


84 




DRB-1 


192 


<1000 




DRB345 


35 


94 
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13. A set of extendible primers, for use in the identification, typing 

or classification of 16S rRNA, wherein set comprises about 210 primers 
and is capable of distinguishing at least about 1207 different sequences. 

5 14. The set of extendible primers of claim 12 or claim 13, wherein 

the primers have variable segments substantially as set out in appendix 1 
or appendix 2. 

15. A method of identification, typing or classification of a nucleic 
10 acid of known sequence having known polymorphisms, by the use of the 

set of extendible primers as claimed in any one of claims 8 to 14, which 
method comprises applying the nucleic acid or fragments thereof to the set 
of extendible primers under hybridisation conditions, and effecting 
template-directed chain extension of extendible primers that have formed 
15 hybrids. 

16. The method of claim 15, wherein the set of extendible primers 
is provided in the form of an array, and template-directed chain extension is 
effected using labelled chain-terminating nucleotide analogues. 

20 

17. The method of claim 16, wherein template-directed chain 
extension is effected using four different fluorescently-labelled chain 
terminating nucleotide analogues, and the results are analysed by total 
internal reflection fluorescence or confocal microscopy. 

25 

1 8. The method of any one of claims 1 5 to 1 7, wherein the 
nucleic acid is a PCR amplimer. 

1 9. The method of any one of claims 1 5 to 1 8, wherein the 
30 nucleic acid is HLA Class 1 or HLA Class 2 or 16S rRNA or a PCR 

amplimer thereof. 
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20. The method of any one of claims 1 5 to 19, wherein a 

dUTP/uracil-DNA-glycosylase system is used to break the nucleic acid into 
fragments. 

5 21. A kit for use in the identification, typing or characterisation of 

a nucleic acid of known sequence having known polymorphisms, 
comprising the set of extendible primers as claimed in any one of claims 8 
to 14. 

io 22. The kit of claim 21 , comprising also a pair of primers for 

effecting PCR amplification of the nucleic acid. 

23. An array of sets of extendible primers as claimed in any one 
of claims 8 to 14, for the simultaneous identification typing or classification 

15 of two or more different HLA genes. 

24. A computer readable storage medium having a program 
recorded thereon, wherein the program consists of instructional steps for 
identifying a set of extendible primers for use in the identification, typing or 

20 classification of a nucleic acid of known sequence haying known 
polymorphisms, the steps comprising: 

i) identifying all possible nucleotide sequences of a chosen 
length of the nucleic acid and their corresponding extendible primers. 

ii) removing at least one extendible primer from the set wherein 
25 the at least one primer removed identifies a segment of the nucleic acid 

identified by at least one other primer. 
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25. Computer readable program implement consisting of 

instructional steps for identifying a set of extendible primers for use in the 
identification, typing or classification of a nucleic acid of known sequence 
having known polymorphisms, the steps comprising: 
5 i) identifying all possible nucleotide sequences of a chosen 

length of the nucleic acid and their corresponding extendible primers, 
ii) removing at least one extendible primer from the set wherein 

the at least one primer removed identifies a segment of the nucleic acid 
identified by at least one other primer. 



